Kaliya Hamlin and Phil Windley both produced very similar but coming from different perspectives Principles for Personal Data Ecosystems. They guide our work here at the Personal Data Ecosystem Consortium.
Kaliya’s: Visions and Principles for the Personal Data Ecosystem
The future is at stake – without control over our own personal data, having a copy of all the digital bread crumbs we are leaving behind in the digital world, we leave ourselves to be tracked, and potentially manipulated by commercial interests without our knowledge.
This presents a vision for core aspects of the emerging interoperable, open standards based ecosystem of personal data services – rooted in the core functionality of a Personal Data Store – the vault/locker/services/broker where all an individuals data is collected and stored and managed.
Dignity of the Individual is Core
Human dignity must lie at the core of the Personal Data Ecosystem. People must be able to shape how they represent themselves in digital contexts. People need the freedom to shape how they present themselves and how the data they generate in their lives is collected and used.
Systems Must Respect Relationships
Relationships must be respected between people, between people and groups, and between groups and groups. The Personal Data Ecosystem must respect that people and communities have different levels of publicness. The relationships that people have with one another must be respected and the social context in which they are formed must be honored.
Remember the Greatness of Groups
Personal Data and control over it give people a core human dignity. It also must be remembered that human social life and human identity is shaped by our participation and membership in groups. It is the core organizing form of our society. Fundamental functionality must enable people to organize in groups, and it must be abstracted from any particular service or domain space.
The Social Web is not Networked Individualism
People broadcasting what they do to their friends or followers does not make a social web; communities and groups do.
Protocols that Enable Broad Possibilities are Essential
Protocols matter deeply: they shape what is possible by their definition of use cases that are possible or not in a given protocol landscape. To have a truly social and dynamic web, there is a role for protocols that are designed specifically for that purpose, not just to create web pages or send emails.
Open Standards for Data and Metadata are Essential
It is vital that the personal data store ecosystem be interoperable with open standards so people are free to choose which personal data services they wish to use. Just like people are free to pick which bank to hold their money and provide services to them in the financial realm.
Defaults Must Work for Most People Most of the Time
All systems have defaults. The paradox of choice is that more options can overwhelm people and they end up not considering the choices they have. Real people need to have input into the creation and ongoing development of systemic defaults.
Norms and Practices in the Personal Data Ecosystem Must be Backed up by Law
Emerging technologies need to have legal agreements and frameworks innovated to match their functionality. The work on the legal framework for this ecosystem is as important as the protocols and code that make it go.
Business Opportunities Abound in this New Personal Data Ecosystem
The paradigm of user collection, control and management of the personal data they are creating implicitly and explicitly around the web is a huge opportunity for services and ways of doing business. Creativity is needed to think through these new possibilities.
Diversity is Key to the Success of the Personal Data Ecosystem
Large companies and nimble startups are all needed for the success of this emerging ecosystem.
Phil’s: PDX Principles
There was a lot of discussion around Personal Data Stores (PDS) and Personal Data Lockers at IIW East. Every time slot on both days had at least one and sometimes two sessions on the subject. (As an aside, if you’re not familiar with IIW, the agenda is created in real time, by the participants, not months in advance by a program committee, so it represents more fully the interests of the participants than a normal conference aganda might.) I’m confident that this will also be a major theme at the upcoming IIW in Mountain View CA in November.
The term itself is a problem. When you say “store” or “locker” people assume that this is a place to put things (not surprisingly). While there will certainly be data stored in the PDS, that really misses it’s primary purposes: acting as a broker for all the data you’ve got stored all over the place and managing the metadata about that data. That is, it is a single place, but a place of indirection not storage. The PDS is the place where services that need access to your data will come for permission, metadata, and location. Similarly for services that need to give you data.
Consequently, some have taken to calling it a PDX, where “x” stands for the “variable x.” That is, we don’t know what to call the last thing, so we’ll say “x” and leave it at that.
In the discussions, I started to tease out a few prinicples that define the PDX and make it something different from just a database where my stuff is. We all have lots of places where data about us is stored and since it’s personal data, we might think of them as “personal data stores” but when people at IIW (and elsewhere) use the term, they’re talking about something larger and more capable that just a passive database.
Here’s a list of a few things that I think distinguish a PDX from just places where your personal data is stored:
- user-controlled – the user needs to be in control of the data, who has access, and how it is used. Once that data is in my PDX, I make decisions about it. That doesn’t mean the data might not also be somewhere else. For example, data about my purchases from Amazon will certainly be stored at Amazon and not under my control. But I might also be emailing the receipts to a service that parses them and puts the data in my PDX for my use.
- federated – there isn’t one place where your data is stored, but multiple places that the data needs to be able to flow between, in a permissioned way. There’s no center, just a lot of cooperating system with my PDX orchestrating the interactions. While Amazon might not give my PDX access to and control over my transactions, my phone company might provide a PDX-capable contact service where I choose to store my contact information.
- interoperable – various PDX services and brokers have to be able to operate together according to standards to perform their roles. When I take money out of my account at Wells Fargo and deposit it at Chase, I don’t lose part of the value because Chase doesn’t know how to handle some part of the transaction. The monetary system is interoperable with standards and, sometimes, shims that connect it all together.
- semantic – a PDX knows more about the data that it holds than existing data stores do. Consider Dropbox. I can put all kinds of things in my Dropbox, but it’s syntactic, not semantic. By that I mean that if I want to put healthcare data in Dropbox and control who uses it, I create a folder and put the data in it with specific permissions. The fact that there is a folder with a certain name located at a particular place in the folder hierarchy is purely syntactic. In a semantic world, the data itself is tagged as healthcare data and no matter where it is, it’s protected according to the policies I’ve put in place.
- portability – a PDX doesn’t trap data in proprietary formats. If my phone company is storing my contact data in the cloud and I decide that I want to move it to my own server or another service, I can—from a technical as well as a policy standpoint. Note that this doesn’t mean we have to wait until thousands upon thousands of data format specification get hammered out. Semantic metadata can provide a means of translating from one format to another.
- metadata management – one of the primary roles of the PDX is managing data about my data. What are the roles I’ve created? What permissions have I granted as exceptions to the defaults? What semantics surround the various data fields? What data sharing, encoding, and encrypting policies have I created? All of this has to be kept and managed in my behalf in the PDX.
- broker services – the PDX is a place where the user manages a federated network of data stores. As an example of why this is important, consider the shortcomings of OAuth. If I use an application that needs access to four OAuth mediated APIs, I have to go through the OAuth ceremnoy with each API provider separately. Now consider that I might have dozens of apps that use a popular API. I have to go through the OAuth ceremony for each of them separately. In short a broker saves us from the N x M explosion of permissioning ceremonies. Similarly for various data services.
- discoverable – a PDX should provide discoverability for its APIs and schemas so that any application I’m interested in knows how to interact with it. Discoverability protects users from having to completely specify addresses, mappings, and schemas to every application that comes along.
- automatable and scriptable – a PDX without automation is worse than no PDX at all because it burdens the user rather than saving effort. A PDX will be a player in a larger ecosystem of services. I don’t see is as a mere API that allows services and applications to GET and PUT data—it’s not WEBDAV on steoids. The PDX is an active participant in the greater ecosystem of services that are cooperating on the user’s behalf.