Recap: PEPR 2020 — Data Governance
3 min read

Recap: PEPR 2020 — Data Governance

Missed PEPR 2020 and want a recap of the data governance talks? Read this.

Overview

This post is the first blog post in a seven-post series. If you want to see all the posts in the series check out Recap: Privacy Engineering Practice and Respect 2020.

This post introduces the first session of PEPR 2020 focused on data governance. While many definitions exist, data governance is an approach to organize one or more aspects of data management efforts, including business intelligence, data security and privacy, master data management, and data quality management (Microsoft).

The three PEPR 2020 talks on data governance are:

  1. Beyond Access: Using ABAC Frameworks to Implement Privacy and Security Policies
  2. Privacy Architecture for Data-Driven Innovation
  3. Responsible Design through Experimentation: Learnings from LinkedIn

Beyond Access: Using ABAC Frameworks to Implement Privacy and Security Policies

In Beyond Access: Using ABAC Frameworks to Implement Privacy and Security Policies, Amanda Walker provides an introduction to Attribute-Based Access Control (ABAC) and how it applies to privacy and security policies. Generally, ABAC specifications define four key attributes: Subject, Action, Object, and Context. Together, these attributes define which subjects can take particular actions on certain objects, in a given context.

Amanda covers two major models of enforcing ABAC by defining Object Attributes and a Policy Service. Each of these has trade-offs in speed, centralization vs. decentralization, static vs. dynamic attributes, etc. In reality, you need a combination of both.

ABAC can help answer access questions like "should this computation proceed?" or "should this computation include this data?" To answer these questions one should consider the purpose of the operation, the jurisdictions involved, public policy, and internal policy. It would be difficult to make the determinations based on data classification, group membership, or identity alone.

Privacy Architecture for Data-Driven Innovation

In Privacy Architecture for Data-Driven Innovation, Derek Care and Nishant Bhajaria share why Uber created a data inventory, the steps they took, what their system architecture looks like, and their lessons learned.

They start by saying companies tend to collect a lot of data about users and their business and are unable to effectively measure the risk of collecting this data. Businesses fail to protect data preemptively, lose track of the data they have, and cannot make informed decisions about data sharing e.g., what data is okay to share, when, and with whom?

Through this journey, Uber appreciated how much discovery is needed to complete a data inventory for a large-scale company. They learned about the sheer volume and diverse types of data, that they did not know they had. There's also an inflection point where collecting or replicating too much data becomes unsustainable and granular deletion plans are required. Finally, their data inventory allowed them to identify issues of data quality i.e., data that may be replicated in multiple locations by different names unnecessarily.

For a detailed look at how Uber automates their data classification and the creation of a data inventory, check out the talk!

Responsible Design through Experimentation: Learnings from LinkedIn

In Responsible Design through Experimentation: Learnings from LinkedIn, Guillaume Saint-Jacques primarily discusses leveraging the Atkinson inequality measure to determine whether changes made to LinkedIn are affecting groups unequally. While changes may have a good "on average" case, the benefit may be disproportionally skewed toward a specific population e.g., power users vs. non-power users.

This type of inequality measure is useful because it allows LinkedIn to determine which features create (or close) unintended gaps. This measurement provides insight into experiments that may look neutral but actually provide value by reducing inequality.

Guillaume also shared some insightful lessons learned. Special attention should be taken when considering "business-neutral" experiments e.g., changes to how the site is rendered. This may have an unequal impact on people with slower internet connections and creates an unintended gap between users. LinkedIn found in many "neutral" experiments, there was some form of inequality impact.

Additionally, push notifications were a great way to reduce inequality by providing instant job notifications to some users and even throttling notifications for highly engaged users. Finally, Guillaume says that both positive and negative inequality impacts are often unintended but are discoverable and measurable using the Atkinson index.

Wrapping Up

I hope this post piqued your interest in PEPR 2020 and future iterations of the conference! If you are interested in checking out other PEPR 2020 sessions check out Recap: Privacy Engineering Practice and Respect 2020.

If you liked this post (or have ideas on how to improve it), I'd love to know as always. Cheers!