Recap: PEPR 2021 — Privacy at Scale
Overview
This post is the first blog post in a seven-post series recapping the PEPR 2021 conference. If you're wondering what PEPR is or want to see the other PEPR conference recaps check out this post!
The three PEPR 2021 talks on Privacy at Scale are:
- Privacy for Infrastructure: Addressing Privacy at the Root
- Cryptographic Privacy-Enhancing Technologies in the Post-Schrems II Era
- Detecting and Handling Sensitive Information in a Data Platform
Privacy for Infrastructure: Addressing Privacy at the Root
In Privacy for Infrastructure: Addressing Privacy at the Root, Joshua O'Madadhain and Gary Young detail why infrastructure privacy reviews are necessary, the types of privacy problems you may encounter, warning signs to look for, and the future of infrastructure privacy reviews.
For this talk, infrastructure is broadly defined as "systems that provide other systems, or products, with capabilities" like storage, networking, and data processing systems. Infrastructure privacy reviews ensure privacy solutions are built at the root and scale to benefit all users of a given infrastructure—failure to do so creates a well-scaled privacy problem.
In addition to the questions asked in a product-focused privacy review, infrastructure reviews should examine how the infrastructure helps its clients to meet their data handling needs. One should consider the client-provided and system-generated data maintained by the infrastructure, as well as the categories of data in scope e.g., personal data and its current, planned, and possible uses. One should also consider how access to the system is controlled, how clients control access to their data, and ensure proper audit logging, retention, and deletion of data.
When reviewing infrastructure one should also identify how much configuration is needed by clients to achieve a good privacy stance. Should clients be incapable of reaching a bad privacy stance or is a good privacy stance even possible? Perhaps clients' options are somewhat in the middle and provide good, but configurable default values. Regardless of the choices afforded to clients, these details, trade-offs, and sharp edges should be clearly documented to avoid bad privacy outcomes.
Gary also shares seven key warning signs and tips to look out for in infrastructure privacy reviews:
- Do not negotiate with infrastructure teams only through clients
- Do not evaluate infrastructure only using product-focused methods
- Do document infrastructure standards and expectations
- Do not assume off-the-shelf infrastructure satisfies privacy needs
- Do ensure the goals of the infrastructure team and clients are aligned
- Beware of infrastructure capable of many things—with great power comes great vulnerabilities
- Beware of uncontrolled externalization of privacy costs onto clients
So what's next in the future of infrastructure privacy reviews? Gary sees a shift toward systemization, the need for infrastructure-oriented risk frameworks, and the implementation of annotations and automation.
Systemization includes identifying, documenting, and applying common solutions which push privacy requirements deep into the technology stack.
Infrastructure-oriented risk frameworks should provide a common language for evaluation and highlight the costs associated with scaling infrastructures with privacy issues.
Finally, annotation and automation help you discover and report bad configurations which fail to meet privacy goals, as well as enforce good configurations based on the type of data e.g., medical data.
Cryptographic Privacy-Enhancing Technologies in the Post-Schrems II Era
In Cryptographic Privacy-Enhancing Technologies in the Post-Schrems II Era, Sunny Seon Kang discusses the Schrems II decision and how it invalidated the EU-US Privacy Shield Framework. The Privacy Shield was frequently used as a transfer mechanism to transfer data between the EU and the US. As a result of this, the European Data Protection Board (EDPB) specified supplementary measures, like Secure Multiparty Computation (MPC), that would allow these data transfers to continue.
Currently, many organizations rely on Standard Contractual Clauses (SCCs) and/or Binding Corporate Rules (BCRs) to transfer data between the EU and the US. SCCs and BCRs are contracts and policies that bind private organizations to adhere to a set of data practices.
While SCCs and BCRs are necessary, they do not always provide sufficient guarantees due to the Schrems II decision. Sunny notes that just like companies fail to honor their privacy notices, SCCs and BCRs are fallible and by themselves do not provide direct technical measures to prevent the misuse and unauthorized access to data.
To address the shortcomings with SCCs and BCRs, the EDPB specified several supplementary measures (additional technical safeguards) that may be needed to ensure the continued transfer of data between the EU and the US. Sunny suggests the following process when evaluating EU/US data transfers:
- Identify transfers of personal data to third countries
- Identify the transfer tools you are relying on
- Assess the legal framework of the third country
- Identify additional safeguards
- Adopt the necessary procedures
- Regularly monitor and review the adequacy of the protective measures
You will likely identify the need and type of supplementary measures during Stage 3 and Stage 4. During these stages, you will have assessed the legal landscape, as well as identified any potential legal issues that may necessitate additional technical safeguards.
One of the supplementary measures you or your business may consider implementing is Secure Multiparty Computation (MPC).
At a high level, MPC enables multiple participants to securely compute functions on a shared dataset without being able to see the underlying data contributed by other participants. For example, a group of friends could compute the average salary amongst them without ever revealing their individual salaries to one another.
Privacy-preserving technologies like MPC help limit the security and privacy risks that may be inherent when transferring data to a third party or a third country in the case of EU/US data transfers. These technologies also satisfy elements of the Fair Information Practice Principles like data minimization, purpose limitation, storage limitation, and privacy by design.
The supplementary measures proposed by the EDPB are being adopted slowly. While the EDPB recommends MPC, there is little precedent regarding its adoption and wider applicability to diverse use cases. As a result, companies are hesitant to invest in and adopt technical measures that are complex, expensive, and not necessarily guaranteed to satisfy regulations.
Detecting and Handling Sensitive Information in a Data Platform
In Detecting and Handling Sensitive Information in a Data Platform, Megha Arora and Mihir Patil describe a solution for detecting, classifying, and managing sensitive information. Organizations frequently struggle to build an accurate data inventory that describes what data the organization has, how it is processed, how long its retained, and who has access to that data. Megha and Mihir's solution to these problems provides a flexible approach that can be applied across companies, industries, and use cases.
The solution described in this talk allows companies to (1) detect sensitive data regardless of where it comes from or what it looks like, and (2) manage that data by applying access controls and data minimization techniques. As a core requirement of the platform, companies should also have a simple but customizable way of defining what sensitive data is, as well as specifying what should be done when it is found e.g., hashing, encrypting, etc.
Mihir describes two mechanisms they use to identify sensitive information: regular expressions and value overlap.
Regular expressions are used to evaluate structured data (data that has a set schema) and can evaluate column names as well as the underlying data values—users can specify whether one or both of these regular expressions must match.
While regular expressions aim to identify a known sequence of characters, value overlap may be useful for discovering exact matches to known names or identifiers in unstructured data. An important distinction is that value overlap requires users to maintain a set of known sensitive data to compare against while regular expressions look for data that matches a known pattern.
Once sensitive data has been found using regular expressions or value overlap, users can specify what should be done with that data. The solution proposed by Megha and Mihir provides the ability to apply access controls or data minimization techniques.
When applying an access control the system creates a "security marking" that is associated with the data. This security marking ensures only people with access to that specific security can access the underlying data. Alternatively, if users apply data minimization techniques they can obfuscate the underlying data by encrypting, hashing, or redacting it.
If you're interested in hearing a bit more about the solution's shortcomings or some key takeaways from Megha and Mihir check out the talk!
Wrapping Up
I hope these posts have piqued your interest in PEPR 2021 and future iterations of the conference. Don't forget to check out the other Conference Recaps for PEPR 2021 as well!
If you liked this post (or have ideas on how to improve it), I'd love to know as always. Cheers!