Rewriting the Religion of Directory Services

by Nick Crown, Director of Product Marketing

In reading a recent post on our blog, I was reminded of a couple of false tenets related to LDAP-based directories that we often hear in the marketplace. But before I get down to my own version of myth busting, allow me to set the stage on a few items.

As we’ve stated here previously, LDAP is an application-level protocol for accessing directories. And directories are simply data stores or databases that are used for storing objects in a hierarchical name/value fashion.  There are no restrictions on the types of objects that can be stored in a directory, although traditionally the most common objects stored in a directory are those representing or directly related to people. The most common attributes stored on the people objects in directories are usernames and passwords. I may be over-generalizing here a bit, but the classic use case for an LDAP-based directory involves authentication via username and password. Knowing this to be true, is there any surprise that LDAP-based directories have been traditionally optimized for reading data? It makes perfect sense. And this gets me to:

False Tenet #1: LDAP is only for read intensive applications.

LDAP directories have been the best solution for read intensive applications because they optimize the schema, indexes, and such to support read operations. However, that is not to say that you could not optimize the configuration of an LDAP-based directory server for a more balanced mix of read and write operations. In fact, in many of the environments where our solution is deployed, our directory server is configured to support exactly this type of scenario. In one such environment, the system is designed to support millions of write operations per second based on the total system throughput – spread out over many individual nodes  – while simultaneously supporting a similar order of magnitude number of reads per second. As with anything at scale, there are always tradeoffs that must be made, and it is impossible to support every use case with a single homogenously designed set of data stores. Just look at the data store diversity within the walls of any major Internet brand today (e.g. Google, Facebook, or Twitter) and you will see multiple data store technologies, each uniquely designed and configured to support a diverse set of use cases. So classifying LDAP technology as only suited for read-only workloads isn’t entirely accurate. We have designed the UnboundID Directory Server to handle writes at the same level of a database in order to break the old religion of using databases for write intensive applications.

False Tenet #2: Directories can only be accessed via LDAP protocol.
LDAP-based directories have had a long and storied past, and they appear poised to continue as a useful technology for the foreseeable future, but LDAP is only one of any number of possible means of accessing data stored within a directory server. As stated in a previous post, our directory server is essentially a NoSQL solution built on a key/value data store that currently is primarily accessed via LDAP. While we built this company on the premise that a carrier-grade LDAP-based directory server was lacking in the world, we also recognize that not everyone is a fan of LDAP. In fact, LDAP expertise can be difficult to come by in some industries, and it’s not necessarily the easiest protocol to work with for developers who expect every service to be accessible via HTTP.  That is one of a number of reasons why we have been investing in the SCIM specification. SCIM (a REST API) is an excellent example of an alternate protocol that can be leveraged for accessing a directory server, whether that is for reading or writing data.

False Tenet #3 – Directories are only suited for people or identity data.
Another false tenet related to directory servers has to do with the type of data that they store.  This gets back to the statement made earlier concerning people data as being the most common type of data seen in traditional deployments. In almost all of the documentation or tutorials available on LDAP, the examples used revolve around people or organizational data. These examples most often are centered on employee-related use cases involving a white pages or phone book-like application.  After all, we are talking about directories here.

Like the issue discussed above, because directories have traditionally been used for identity data and ship with standard schemas that support a wide range of use cases related to identity data, it makes sense that they are considered a natural choice for identity data. That is precisely why our company chose an LDAP directory as the foundation of our customer/subscriber/user data platform. And while the overwhelming majority of the objects stored in LDAP installations today are directly related to people data, there are no restrictions on the type of data that can be stored in a directory. In fact, the schema is completely open and flexible. While we are not witnessing an explosion in the growth of non-people related data finding its way into our directory, we are seeing a massive expansion in the number and types of attributes being attached or linked to the traditional people related objects (think Facebook-style profile). This fits with the broader trends in the market that are generating more online services, and therefore more online identities, and an increasing number of connections between those identities.  Of course, this data has to live somewhere, and if it constitutes a reasonable size and must be delivered in real-time to enable these online services, then our data platform is an excellent fit.

As we’ve discussed here, LDAP directories are simply data stores (ours is of the key/value type) that have traditionally been accessible via the LDAP protocol.  The UnboundID Directory Server, is optimized for both read and write-intensive applications., can be accessed by multiple protocals, and can store data other than identity information..  In fact, if you need a data solution that supports performance-at-scale, a distributed data model via a built-in data replication service, integrated security and privacy controls, and a real-time, bi-directional synchronization service for integrating disparate data sources, than you should consider our solution.  And LDAP is not required for use.