Philosophy In line with current policy thinking
openPDS allows users to collect, store, and give fine-grained access to their data all while protecting their privacy.
With the rise of smartphones and their built-in sensors as well as web-apps, an increasing amount of personal data is being silently collected. Personal data–digital information about users’ location, calls, web-searches, and preferences–is undoubtedly the oil of the new economy. However, the lack of access to the data makes it very hard if not impossible for an individual to understand and manage the risks associated with the collected data. Therefore, advancements in using and mining this data have to evolve in parallel with considerations about ownership and privacy.
Many of the initial and critical steps towards individuals data ownership are technological. Given the huge number of data sources that a user interacts with on a daily basis, interoperability is not enough. Rather, the user needs to actually own a secured space, a Personal Data Store (PDS) acting as a centralized location where his data live. Owning a PDS would allow the user to view and reason about the data collected. The user can then truly control the flow of data and manage fine-grained authorizations for accessing his data.
Publications and Press
- On the Trusted Use of Large-Scale Personal Data, IEEE Data Engineering Bulletin, 35-4 (2012)
- Big Data Is Opening Doors, but Maybe Too Many, The New-York Times
- ACLU: AT&T Customer Privacy at Risk, Wall Street Journal - CIO Journal
- Private data gatekeeper stands between you and the NSA, New Scientist
- Getting More Value from Cell-Phone Data, Technology Review
- openPDS software focuses on control of personal data, Phys.org
- How to stop the NSA spying on your data, New Scientist
- Why the collision of big data and privacy will require a new realpolitik, GigaOM
- How Big Data Can Transform Society for the Better, Scientific American
- Reiventing Society in the Wake of Big Data, The Edge
- Big Data Analytics: You Have The Right To Remain Private. Or Do You?, Cisco - The network
We believe that a a New Deal on data is needed. When it comes from data, "ownership" should to be thought of according to the old English common law. Data ownership would therefore be defined as the rights of possession, use, and disposal instead of a literal ownership.
Discussions on such changes and their implications for privacy must also take into account the current political and legal context. We developed openPDS to be the reference implementation of the policies proposed by the National Strategy for Trust Identities in Cyberspace (NSTIC), The Department of Commerce Green Paper, and the Office of the President’s International Strategy for Cyberspace. openPDS implementation is also aligned with the European Commission’s 2012 reform of the data protection rules. This reform states individuals’ right to be forgotten, to have easier access to their data, and to be able to easily transfer them. These recommendations, proposed reforms, and regulations all recognize the increasing need for personal data to be under the control of the individual as he is the one who can best mitigate associated risks
The system rules and participation agreements address the need for harmonized business, legal and technical measures to enable distributed and interoperable systems such as openPDS. The latest version of the documents are available on our GitHub repository, where the current research and development on the legal and software code is openly available for public access and re-use.
Architecture and SafeAnswers
Protecting the privacy of personal data is known to be a hard problem. The recent advances in collecting, storing, and processing high-dimensional data such as call or credit card records at scale makes it even harder. The risks associated with these high-dimensional data are often subtle and hard to predict and anonymizing them is known to be a challenge.
Geospatial data, the second most recorded information by smartphone apps, is probably the best example of the risks and rewards associated with high-dimensional data. On the one hand, the number of users of location-aware services such as Google Local Search, Foursquare and Glancee, are rising quickly as they demonstrate the benefits of location-based services to users. On the other hand, a recent study showed that 4 spatio-temporal points, approximate places and times, are enough to uniquely identify 95% of 1.5M people in a mobility database. The study further shows that these constraints hold even when the resolution of the dataset is low. Therefore, even coarse or blurred datasets provide little anonymity.
Only answers, no raw data
We strongly believe that it will be extremely difficult to anonymize high-dimensional data such as geolocation while retaining the value of the data. Consequently, openPDS turns the problem on its head using a innovative SafeAnswers framework. SafeAnswers allows applications to ask questions that will be answer using the user's personal data. In practice, applications will send code to be run against the data and the answer will be send back to them. openPDS ships code, not data. openPDS turns a very hard anonymization problem to an easier security problem.
SafeAnswers uses two separate layers for aggregating the user’s data: (1) sensitive data processing takes place within the user’s PDS allowing the dimensionality of the data to be safely reduced on a per-need basis; (2) data can be anonymously aggregated across users without the need to share sensitive data with an intermediate entity through a privacy-preserving group computation method
With SafeAnswers generic computations on user data are performed in the safe environment of the PDS, under the control of the user: the user does not have to hand data over to receive a service. Only the answers, summarized data, necessary to the app leaves the boundaries of the user’s PDS. Rather than exporting raw accelerometer or GPS data, it could be sufficient for an app to know if you’re active or which general geographic zone you are currently in. Instead of sending raw accelerometers readings or GPS coordinates to the app owner’s server to process, that computation can be done inside the user’s PDS by the corresponding Q&A module.
Implementation and preliminary studies
All our code is open-source and available on our GitHub account.