2 min read

Stop Talking About Personal Data

Your stakeholders deserve better mental models—focus on outcomes, not semantics.

Introduction

I try to avoid talking about personal data. Why? There be dragons.

Personal data is any information that relates to an identified or identifiable natural person. Seems simple enough—why avoid it?

Did you see your stakeholders' eyes glaze over when you recited that ancient script?

Personal data is important but comes with serious baggage.

Stakeholders rarely have good mental models of what personal data is—even if they think they do. This means handling of personal data goes under-reported, which leads to worse outcomes for people and your organization.

Why does this happen? They had a mental model that failed.

Impact of Failed Mental Models

Privacy practitioners love to ask whether stakeholders process personal data.

Why? Our job depends on the answer. No personal data? No problem 🚀

What if stakeholders answer incorrectly? Any privacy work probably stops there.

For most organizations, this means a primary, but weak, mechanism to identify privacy requirements and deficiencies has failed. This occurs because 1) stakeholders' mental model of personal data failed or 2) they maliciously circumvented the question.

Every time this mental model fails, you won't satisfy privacy requirements for that use case. This means you won't be able to fulfill data subjects' rights, implement retention periods, produce regulatory-required documentation, etc.

The first time your privacy team hears about this will be from the New York Times.

Not good.

Baggage of 'Personal Data'

Stakeholders—especially engineers—love to argue semantics.

This leads to countless hours arguing about topics like what's identified vs. identifiable, why removing a few columns isn't sufficient anonymization, etc.

💡
Where there's a will to avoid privacy work, there must be a way through semantics.

A few of my favorite refrains include:

  1. I don't have any PII
  2. I don't have personal data, I only handle internal identifiers
  3. I don't have personal data, I only process data about employees
  4. I've removed all the identifiers from the data set, we good?

Talking about personal data opens Pandora's Semantic Box. But how to avoid it?

There must be a better way.

Building Better Mental Models

If you can't say personal data, what do you call all that shtuff?

I've found persona and category-based data classification works best.

For persona-centric data classification, prefer labels like Customer Data or Employee Data. Use the persona of the individual to describe their data.

For category-based data classification, use terms like Financial Data or Biometric Data instead of sensitive personal data. It's more precise and you can assign specific requirements or restrictions around these more intuitive categories.

Why do I prefer these mental models? Why do they work?

  1. It embeds privacy requirements using business context
  2. It's more readily understandable by non-privacy experts
  3. It avoids historical definition baggage, e.g., PII
  4. It helps generalize differences between different laws and regulations
  5. It augments traditional security data classifications

Do you want to argue semantics or focus on actionable privacy outcomes?

Wrapping Up

I sincerely hope this post was accessible, useful, and practical for you. If you have any feedback on this post, please let me know. Cheers.