Google Wants to Help Tech Companies Know Less About You

Credit to Author: Lily Hay Newman| Date: Thu, 05 Sep 2019 10:00:00 +0000

By releasing its homegrown differential privacy tool, Google will make it easier for any company to boost its privacy bona fides.

As a data-driven advertising company, Google's business model hinges on knowing as much about its users as possible. But as the public has increasingly awakened to its privacy rights, this imperative has generated more friction. One protection Google has invested in is the field of data science known as "differential privacy," which strategically adds random noise to user information stored in databases so that companies can still analyze it without being able to single people out. And now the company is releasing a tool to help other developers achieve that same level of differential privacy defense.

Today Google is announcing a new set of open source differential privacy libraries that not only offer the equations and models needed to set boundaries and constraints on identifying data, but also include an interface to make it easier for more developers to actually implement the protections. The idea is to make it possible for companies to mine and analyze their database information without invasive identity profiles or tracking. The measures can also help mitigate the fallout of a data breach, because user data is stored with other confounding noise.

"It’s really all about data protection and about limiting the consequences of releasing data," says Bryant Gipson, an engineering manager at Google. "This way, companies can still get insights about data that are valuable and useful to everybody without doing something to harm those users."

"If you want people to use it right you need to put an interface on it that is actually usable by actual human beings."

Lea Kissner, Humu

Google currently uses differential privacy libraries to protect all different types of information, like location data, generated by its Google Fi mobile customers. And the techniques also crop up in features like the Google Maps meters that tell you how busy different businesses are throughout the day. Google intentionally built its differential privacy libraries to be flexible and applicable to as many database features and products as possible.

Differential privacy is similar to cryptography in the sense that it's extremely complicated and difficult to do right. And as with encryption, experts strongly discourage developers from attempting to "roll your own" differential privacy scheme, or design one from scratch. Google hopes that its open source tool will be easy enough to use that it can be a one-stop shop for developers who might otherwise get themselves into trouble.

"The underlying differential privacy noisemaking code is very, very general," says Lea Kissner, chief privacy officer of the workplace behavior startup Humu and Google’s former global lead of privacy technology. Kissner oversaw the differential privacy project until her departure in January. "The interface that’s put on the front of it is also quite general, but it’s specific to the use case of somebody making queries to a database. And that interface matters. If you want people to use it right you need to put an interface on it that is actually usable by actual human beings who don’t have a PhD in the area." (Which Kissner does.)

Developers could use Google’s tools to protect all sorts of database queries. For example, with differential privacy in place, employees at a scooter share company could analyze drop-offs and pickups at different times without also specifically knowing who rode which scooter where. And differential privacy also has protections to keep aggregate data from revealing too much. Take average scooter ride length: Even if one user’s data is added or removed, it won’t change the average ride number enough to blow that user’s mathematical cover. And differential privacy builds in many such protections to preserve larger conclusions about trends no matter how granular someone makes their database queries.

Part of the reason it's so difficult to roll your own differential privacy is that these tools, like encryption schemes, need to be vetted by as many people as possible to catch all the flaws and conceptual issues that could otherwise go unnoticed. Google's Gipson says this is why it was such a priority to make the tool open source; he hopes that academic and technical communities around the world will offer feedback and suggestions about improving Google's offering.



Uber similarly released an open source differential privacy tool in 2017 in collaboration with researchers at UC Berkeley and updated it in 2018. Apple, meanwhile, uses a proprietary differential privacy scheme. The company has been on the forefront of implementing the technology, but independent researchers have found that the approach may not offer the privacy guarantees Apple claims.

Google says that one novel thing its solution offers is that it doesn't assume any individual in a database is only associated with one record at most, the way most other schemes do. This is true in a census or medical records database, but often doesn't apply to a data set about people visiting particular locations or using their mobile phones in various places around the world. Everyone gets surveyed once for the census, but people often visit the same restaurant or use the same cell tower many times. So Google's tool allows for the possibility that a person can contribute multiple records to a database over time, a feature which helps to maintain privacy guarantees in a broader array of situations.

Along with the tool itself, Google is also offering a testing methodology that lets developers run audits of their differential privacy implementation and see if it is actually working as intended.

"From our perspective, the more people that are doing differential privacy, inside of Google or outside, the better," Google's Gipson says. "Getting this out into the broader world is the real value here, because even with lots of eyes on a thing you can still miss glaring security holes. And 99.9 percent differentially private is not differentially private."

As with any technique, differential privacy isn't a panacea for all of big tech's security ailments. But given how many problems there are to fix, it's worth having as many researchers as possible chasing that last tenth of a percent.

https://www.wired.com/category/security/feed/