Understanding Data De-Identification: A Practical Guide

October 27, 2025Guidance

Understanding Data De-Identification: A Practical Guide

This document from Ontario’s privacy watchdog explains how organizations can protect people’s privacy while still using valuable data. Here’s what you need to know:

What is De-identification?

Think of de-identification as removing the “name tags” from data so you can’t easily figure out who someone is. It’s like publishing survey results without being able to trace answers back to specific people.

Two key processes:

Pseudonymization: Replacing obvious identifiers (names, addresses, phone numbers) with codes or removing them entirely
De-identification: Going further to also disguise subtle details (like birthdates, postal codes) that could still identify someone when combined

The Core Principle: Risk-Based Approach

De-identification doesn’t make re-identification impossible, it makes it very unlikely. The goal is to reduce the risk to a “very low” level based on what’s reasonably foreseeable, not to achieve zero risk.

Public vs. Private Data Sharing

Public release (like open government data):

Assumes anyone might try to identify people
Requires heavy data transformation
No practical way to enforce rules on users

Private sharing (with specific partners):

Can assess who’s receiving the data
Uses contracts and security measures
Requires less data distortion because controls provide protection

Key Concepts Explained

Direct identifiers: Obvious personal details like names, addresses, health card numbers

Indirect identifiers: Details that seem harmless alone but can identify someone when combined, like birth year, gender, postal code, profession, or education level

The privacy-utility tradeoff: The more you protect privacy by changing data, the less useful it becomes. The art is finding the right balance.

The 12-Step Process

Organizations should:

Get expert help – This is technical work requiring specialized knowledge
Define clear purposes – Know why you’re sharing data and with whom
Determine release type – Public or controlled sharing?
Classify your data – Identify which fields could reveal identities
Remove obvious identifiers – Pseudonymize first
Set risk thresholds – More sensitive data requires stricter protection (typically keeping re-identification risk below 5-9%)
Measure vulnerability – Calculate how identifiable the data is
Assess attack likelihood – For private sharing, evaluate recipient’s security
Transform the data – Generalize, suppress, or add noise to reduce risk
Check usefulness – Ensure data still serves its purpose
Document everything – Create records of your decisions and methods
Monitor ongoing – Re-assess every 2-3 years as new data sources emerge and technology changes

Common Protection Techniques

Generalization: Changing “born in 1985” to “born 1980-1989”
Suppression: Removing unusual values that make someone stand out
Adding noise: Slightly randomizing numbers like dates or amounts
Synthetic data: Using AI to create fake but realistic data that maintains patterns without matching real people

For Private Data Sharing

Organizations must implement strong controls including: limiting access to authorized staff only, requiring confidentiality agreements, securing data storage, training employees on privacy, monitoring access through audit logs, and having breach response protocols.

Important Warnings

Simply removing names doesn’t make data safe. Aggregated or summarized data can still be identifiable
Linking multiple de-identified datasets together can dramatically increase re-identification risk
If someone does successfully re-identify data, organizations must verify the claim, notify affected individuals, retrieve datasets where possible, and review their methods

The Bottom Line

De-identification is a specialized process that balances privacy protection with data utility. It requires technical expertise, careful documentation, ongoing monitoring, and, for private sharing, strong contractual and security controls. When done properly, it allows valuable data to be used for research, innovation, and public good while keeping people’s personal information protected.

Newport Thomson

Understanding Data De-Identification: A Practical Guide

Sign up to our newsletter!

Yes, please send me information on data, privacy and email compliance in Canada, Europe and the USA from Newport Thomson, 4800 Dundas St West Toronto ON, Canada M9A 1B1. I can reach your Chief Privacy Officer at info@newportthomson.com or 416 524 7844. I know I can unsubscribe at anytime.

Company

Explore

A Global Privacy Agency Based in Canada

Helping marketers implement customer-first privacy programmes.

Location

Inquiry

© Newport Thomson - 2025. Develop by LACKEY Advertising