The relational database is becoming increasingly less useful in a web 2.0 world. The reason for this is that, while the relational database model is great for storing information, it is horrible for storing knowledge. By knowledge I mean information that has value beyond the narrow current conception of the given application. I mean information that can have enduring value. In this context, one might say knowledge is information in non-disposable form.
The reason the relational database doesn’t represent knowledge very well is that the relational database is only good at storing objects and relationships between them when one fully understands exactly what objects and what relationships will be managed upfront. When you need to represent some new type of relationship between the objects in a relational database, it tends to fail, or be very difficult. In fact, the relational database isn’t even particularly good at adding new types of objects to the database. Most relational databases actually have an upper limit on the types of objects, typically referred to as tables, which can be handled. Too many tables in a database schema is considered bad design.
The way I usually describe the situation is to say that the relational database is brittle but strong. As long as you don’t want to radically change or expand the scope of what you are doing, relational databases are great. But knowledge is an ever-expanding universe of objects and relationships between them. The relational database doesn’t handle that use case very well.
Storing the relationships between objects *in* the objects is a problem.
The essence of why the relational model doesn’t handle the more dynamic model of knowledge as opposed to information, is that relational databases are built around the idea that the relationship between objects is *built into* the objects. For example, invoices are typically stored as one type of object in a database. Customers are a different type of object. An invoice knows *as part of its structure*, who the customer is. That pointer to the customer is stored *in* the invoice.
This is bad.
The reason it is bad is because it means that in order to create new relationships between different object types we need to modify those object types. For example, if the developer decides to allow payment records to be connected to invoices, either the structure of the payment record or of the invoice must change. So, with a relational model, you really want to make all the decisions about the valid types of relationships between objects right from the very beginning because you don’t want to have to modify the structure later.
“Excuse me Mrs. Smith. We require you to decide on all of your child’s friends for life before you go into labor.”
Think about this.
Needing to know your database structure upfront is like needing to make a list of all of your unborn child’s potential friends. Forever. This list must even include future friends that have not been born yet, because once the child’s friends list is built, adding to it requires major surgery.
This rigidity prevents most developers from trying to build knowledge. They just capture information. Data is stored in separate unchangeable relational silos. Every time we think of a new way to represent or expand information we just make a new silo, because adding to or modifying an existing silo is way too difficult.
The societal implications of this fracturing and splintering of information are profound. And yet, the converse is incredibly empowering. How great would it be if when we thought of a new piece of information that we want to capture, we could simply add it to our existing database? Or perhaps if we can add things this easily it is more like a knowledge base than a database. Such flexibility would mean that we would have the benefit of leveraging the new information in the context of the existing information, building newly accessible insights along the way.
For example, imagine starting out with a contact list. Some months later, you add a restaurants list. Some months later again, you decide it would be great to be able to capture, for each contact, what their favorite restaurants are. Ideally one would want to just establish a “favorite” relationship between a restaurant and a contact without changing the restaurant structure or the contact structure. This is a simple example, but the bigger point is that relationships between pieces of information will always grow more complex tomorrow than they are today. Capturing and leveraging new types of information to increase knowledge should be a key design goal of modern databases.
Those computer science guys are on to something with that graph stuff.
The concept of having relationships between objects be separate from the objects themselves is the core concept behind what is known in computer science as graph theory. Graphs are collections of pieces of information and the connection between those pieces of information. In a graph, the pieces of information are called “nodes,” and the connections between nodes are called “edges.” Computer scientists like graphs because they are a universal way of expressing literally almost any type of information.
Too many computer scientists spoil the semantic web stew.
The graph is the underlying model of a new highly discussed but rarely used data storage concept called the semantic web. The semantic web is really, in its simplest form, the idea that information on the web should be stored in databases structured like graphs. This would allow information on the web to be much more intelligently accessible and expandable in a way that relational database systems are not.
Sounds good.
Unfortunately, the semantic web is proof that while a little geek is good, but too much geek is, well, too much geek. The problem is that the people that created the semantic web were just way too smart. In fact if you read even the watered down Wikipedia description of the semantic web, it sounds like useless abstract gobbledygook. As a result, the semantic web is too great a leap from the tried and true relational database. In fact, it doesn’t even feel like relational database users were a target audience for the semantic web architects. But whether they were aggressively targeting mainstream database developers or not, the gap between the two methodologies is far too great not only because the semantic web is hard, but because relational tools are being greatly simplified, which just increases the gap.
Specifically, newer technologies like the ActiveRecord system in Ruby on Rails, have done a great job at abstracting much of the mind numbing complexity out of the relational model. Now, along comes the semantic web just in time to make us all feel really dumb again. The semantic web makes the relational database model feel positively Fisher-Price. The semantic web is, and will be, for most developers, a non-starter.
Hey man, I just wanna build a little web app!
But the biggest issue with the semantic web is that it is really conceived to solve problems that your average everyday web developer just doesn’t care about. It is an ivory tower solution. Ironically, the concept of a graph representation of information is totally relevant to someone building a web 2.0 application. But the tools, languages, and methodologies of the semantic web do not have the scrappy, agile, PHP web developer in mind. And so, for most such web developers, the semantic web is irrelevant.
And so, the relational database is old and ill suited to the modern data management world. The graph model is much easier and more appropriate for typical web tasks. But it needs to be productized in a way that makes it easy for developers to fit it into their workflow.
Of course, once you start thinking of information as a graph, all sorts of interesting things become possible. There is much more to talk about, but for now this should be sufficient food for thought.