Where Does My API Come From? — Personal Data Ecosystem Consortium

A lot of people step up to the Locker Project and are immediately looking to start using the data stack to create an application. I’m super excited that people can see the vision, and want to build on top of it, but it’s not quite there yet. With some effort the steps to start pulling data, collecting it, and then build on top of it can be figured out, but it’s far from intuitive or uniform. Once someone gets a working system the next question is, “How do I use this?” or, “Where is the API?” It can definitely get confusing quickly, so let’s pull back the layers, a bit, and start to see where the API exists.

One key point to keep in mind while reading this is that internally all of the services are normally communicating via REST + JSON. A few exceptions will be mentioned below.

The Layers

The first piece we need to ponder is how we represent service data in a general way. In the Locker Project we use a mime-type like system to document the services. There are higher level primary types such as message, photo, link, and contact. Under these are more specific types that usually represent an external service, or a new complete data structure. Examples of currently used complete service-types are contact/google, photo/flickr and link/chrome. Each service-type is a definition of a set of REST endpoints, the JSON structure they return, and the JSON structure of any events generated.

Bottom up we start with the Connector. The Connector exists to sync data from other sources into your Locker. Retrieval should collect as detailed and complete a set of data as possible and store it in a historical archive. This data is a service-type, for example records from the Google contact manager would be contact/google. Each service-type will have a document describing the details of the information that are stored, attempting to keep as much of it represented by JSON as is feasible.

Based on the primary type of the data (the “contact” in contact/google) we can begin to expose an API to the Collection and App layer. Every primary type will have a document that states what API endpoints a Connector must expose. For example a contact type might require implementation of a /getAllContacts that would return a JSON array of all of the contacts, in as complete a JSON representation as possible, the Connector has found. There might also be a /getContactById that would return the complete and detailed information of the complete type, as documented.

These documents will also describe the JSON properties that are expected to be found in events generated by the Connector. To continue our contact example, we could imagine that the document would require the JSON object in a contact/google event to have the properties fullname, email, and number. The event should still send as detailed of an object as possible, but these properties would be required for further use and understanding.

It’s important to note that a single Connector could potentially handle multiple service-types, so it would need to implement the defined API for all of the types that it provides.

The next layer up is the Collection, and is the main processing layer of the stack. Collections use the exposed Connector APIs and events to watch for a given primary type and then process it. The collections attempt to collate, dedupe, and generally aggregate as much as possible of a single primary type. Throughout this process they maintain as much of the complete data types from the Connector level, as possible. Collections also maintain the ability to know where the data came from and back track it to a specific Connector and object id so that any lost or extra details can be retrieved. This aggregated data is then saved in a more dynamic and queryable data store (To be expanded upon in another doc). The Collection can then expose the data in as many meaningful ways as possible. This could be a simple /getAllContacts endpoint, or potentially even an interface that is a complex query against the data using something like the Mongo style syntax. These endpoints would be documented in the primary type API document.

The final layer, the Apps layer is the simplest in terms of API. They are API consumers, using primarily the Collections to get data for user interaction. They might need to go back to the Connector level if they know of a more specific piece of information that wasn’t carried through the Collection layer. The focus here is interaction, you might find yourself starting to create a new data type based on how the user interacts with the data, or perhaps what the user creates in this app. At that point it’s best to consider creating a new Connector and Collection then refocus the App to the user interaction elements with that new stack.

So Where’s the API Again?

Hopefully this illustrates that there are different levels of API access for the different components of the locker. The API endpoints are defined first by the type, and then the service that is using that type. Each type will get a document in the docs directory containing all of the arguments, returns and expectations.

Sounds like we really know what’s going on here! Well, we have a good design concept in place, but all the development work is still underway. Our Connectors and Collections are currently a prototypical evolution of disparate APIs and use cases, with most things hardcoded to just work until we update the internal architecture as we learn from each implementation. The next big development push focuses on these areas, though, and we need help! We need your input to help shape the APIs and developer usability of the system so jump into IRC (freenode #lockerproject) and get in on the conversations as we grow!

Permalink

| Leave a comment »