Wednesday, March 26, 2008

Database Gurus Say: Don’t Tase Me Bro!

Last month I wrote an article called The Death Of The Relational Database. The basic premise is that in the future, graph databases, particularly for web 2.0 applications, will be far more useful than relational databases. As my readership has grown and people have discovered the piece, it continues to interest people. Yesterday it had a major traffic spike, which triggered a new rash of comments.

Some of yesterday’s responses to the article have brought to mind a phenomenon that I first observed in college. When people fear some new technology will change the nature of and potentially the value of their high priest status, they react negatively to that possibility.

The first time I observed this phenomenon in 1984, when the Mac was introduced. I was at The University of Pennsylvania, which was one of the twenty or so schools that Apple had partnered with. As a result there were a lot of us Mac guys around. But I was a CSE student at the Moore school of Electrical Engineering. All of our facilities were in the electrical engineering school including the computer lab. And so you can imagine this was an extraordinarily geeky place.

The guys running the computer labs were all Unix guys. They hated the Macs. In fact they hated the *idea* of the Mac. The idea that something could be friendly or easy was offensive to them. The idea that anyone would or could use anything other than a command line was incomprehensible. Graphical user interfaces sucked! Arguments included the idea that if you didn’t know what you were doing you really shouldn’t be using a computer.

Of course now that just sounds stupid.

The next really big revolution of this sort that I can recollect was when Aldus introduced PageMaker and the whole concept of desktop publishing. In this case it was the graphic design community that suggested that desktop publishing was not a good thing. Making it too easy meant there would be all these horrible designs out there from people that didn’t know what they were doing. Heaven forbid!

Once again, today, that just sounds stupid.

I could go on, but the point is these cycles will never go away. The digital photography revolution yielded the same resistance. The experts are always telling us how dumb we are for seeking simplifications and abstractions for things that they are expert at. The cycle repeats itself with every innovation. Paradigm shifts are always perceived as a threat to the status quo. And sometimes they actually are. And so the arguments are always:

  • It’s not powerful enough.
  • There’s no need.
  • It’s dangerous.

But the underlying psychological framework is really a fear of irrelevancy. If you make things too simple my expertise will be less important. *I* will be less important.

And so, as I have made the argument that for Web 2.0 applications, the graph model is far more effective than the relational model, the “high priests” are coming out of the woodwork. The arguments are the same as they always are: not powerful enough, unnecessary, and dangerous.

So as I read the comments yesterday, one theme that emerged was that anything that you can do with a graph you can do with a relational database – and the relational model is more powerful. This of course is very similar to the gui/command line argument. Why make something easy? You can do all that with a command line!

But I must say my favorite is always when you get to the “dangerous” argument. That is an inflection point. When you get to the dangerous argument, you are being honored by an unwitting concession speech. It is *always* the sign you have won. And around 5pm est. it happened. One of the high priests lobbed the long ball. “Managing a database without a DBA is like letting children drive trucks.”

Interception. Touchdown. Game Over.

42 comments:

tamberg said...

Christensen called it disruption (http://www.google.com/search?q=christensen+innovators+dilemma).

Regards,
tamberg

chuck said...

The commenter later said "I work with people who can't master Excel." I take it he sees DBAs as truck drivers, and people who can't figure out Excel as the children. It sounds like part of what you're proposing is a database for programmers and web developers that lets them build database-backed applications without requiring a DBA specialist. I wouldn't let children drive a truck either, but I don't need a truck, and hence shouldn't need to hire a professional truck driver, to get my groceries home from the store.

I'm really intrigued by these posts and interesting in whatever you're working on. As a developer, naturally I work on applications that need to store data, but relational DB is a constant source of annoyance, often feels like overkill, and doesn't seem to suit the kind of agile development process that one needs when your understanding of the application's problem domain is incomplete at the start, and evolving as you work.

Anonymous said...

Do it! Do it! Do it!!!

Anonymous said...

Hi,
I'm a reporter from News.com working on a story about developers creating apps for Android's and Apple's mobile platforms. Was hoping to talk to you for the story. Could you contact me?

Best,
Stefanie
Stefanie Olsen
Senior Writer
CNET News.com (www.news.com)
415-344-2507
stefanie.olsen@cnet.com

235 Second Street San Francisco 94105

Justin D-Z said...

I love managing my database. I'm in Marketing, for God's sake and got a C (I think) in my lone Oracle class in college. The difference? I'm not building cloudware by myself at home in the evenings.

Excellent bit about the concession speech. I felt pangs of that reading one of Steve Yegge's articles on why IDEs were bad/unnecessary. He's right, but only for certain types of people. He'd probably agree.

Chris Cummer said...

As a long-time RDb user with no graph model experience your original post was an interesting read, but due to its brevity left me thinking "ok... so what's an example of a graph db system in practice though?"A quick googling took care of that but I suspect having a real-world example in your post might have helped to mitigate us RDb users' innate tendency to mentally model your graph examples as key-based relationships in our heads.

Regardlessr, the comments on that post are some of the best stuff I've read in awhile. While at times argumentative they're also extremely informative and thoroughly enjoyable.

And you compose yourself well through the restraint shown in a number of your responses; I'm not sure I would have been so... disciplined.

Good posts all-around.

darose said...

Yep, couldn't agree more Chuck. I think you state the problem quite concretely here:

As a developer, naturally I work on applications that need to store data, but relational DB is a constant source of annoyance, often feels like overkill, and doesn't seem to suit the kind of agile development process that one needs when your understanding of the application's problem domain is incomplete at the start, and evolving as you work.

I've been hacking on a DB that tries to address a bunch of these issues. (With a different approach than Hank - a graph DB doesn't really address the itches I'm trying to scratch.)

I think a couple of Greg's comments from the "Death Of The Relational Database" post are pretty telling too:

For lots of data it's more important to maintain data integrity than it is to make the database friendly to users.

and

You haven't shown any alternative [to RDBMS'] that would be both friendlier to non-experts and more powerful.

Being able to counter these 2 statements is the key to the success of an effort like this. Right now, the big issue with having a key-value store database is that it later becomes difficult to query for a particular datum within a massive soup of heterogeneous key-value data. The people that can come up with a solution that provides both flexibility *and* data query performance will win here.

Hank Williams said...

@Chuck

You are essentially correct, though there is actually an aspect to what we are doing that can be reasonably exposed even to end users. But the "pain point" you describe is exactly what we are addressing.

@Chris

Thanks. I will certainly write more about what we are doing, but I think I will also write about some of the other things that people are doing that address in different ways some of the things we are thinking about. Examples include thruDB, couchDB, and others. They are not at all equivalents to what we are doing, but they are taking a somewhat different approach to some of the same problems.

@david
The thing about "you havent shown..." is not really fair. This is not a product announcement. I am trying to discuss conceptual issues that are solved by the graph model. I don't believe that in order to discuss these issues I need to write a white paper on our technology. I am not selling anything yet. What I do want to do is engage in discussions about ideas.

David Mathers said...

Hi Hank, I like to be brief so this might sound bad, but I mean everything in the friendliest possible way.

You and almost all the people arguing against you are wrong about almost everything. I don't mean to say anyone's opinion is wrong. I'm talking about basic understanding of what certain words mean.

Let me break it down. There are 3 models for "programming" (in the general sense) computers:

1. Functional
2. Relational
3. Imperative

The functional model can't store data and therefore can't be used to create a database. So there are fundametally only 2 kinds of databases.

A database created using the Relational model uses relations to both store and retrieve data, so lets call it a relational database. A database created using the Imperative model uses pointers & nodes to store data and pointer navigation to retrieve data, so lets call it a navigational database.

That's it. There are only 2 database models. Each model can be used to implement different kinds of databases based on the limits they place on the structure.

There are 2 primary kinds of navigational database: graph/network and tree/hierarchy. A filesystem for example is a tree/hierarchy database.

A relation is basically a truth table with columns that are related to each other by a truth statement and rows of truth values that fulfill the truth of the statement. A standard relational database doesn't place any limits the number of rows or columns. A binary database limits the numbers of columns to 2.

A SQL DBMS is a (partially successful) attempt to implement a language that can be used to create a database which uses relational model.

OK, the important parts:

1. The semantic web is an implementation of the relational model that limits the relations to 3 columns and a single row.

2. Just because you use a SQL DBMS to create a database doesn't mean you actually created a relational database. You can put pointers in your tables, turning your relations into nodes on a graph, turing your database into a navigational database with some relational features.

Much of what you said in your original post was exactly backwards. You said "relational sucks" but the things you described as problems were features of navigational databases, not relational. Then you said "the semantic web is awesome because it's a navigational database" when in fact it's a relational database.

That's all.

Hank Williams said...

@david

Then I guess the problems we and others are trying to address don't really exist. Thankfully there are a lot of people who seem to understand what we are talking about and who want a solution to the non existent problems using current RDBMS tools. That being said, I am happy to be so "wrong" :)

David Mathers said...

Huh?

I said explicitly that I don't think your opinions are wrong. Your terminology is wrong. The semantic web uses a relational database model. When you implement a pointer database using SQL RDBMS you have a pointer database, not a relational database.

Personally, I think SQL and all the DBMS's based on it suck pretty hard.

Hank Williams said...

@David

I am so sorry. I obviously misunderstood. I now re-read and see that you are talking about terminology. Still though, I disagree with this statement:

"You said "relational sucks" but the things you described as problems were features of navigational databases, not relational."

I do not believe when people build apps using mysql, that they are building navigational databases. If that is your point, I respectfully disagree.

Anonymous said...

"I do not believe when people build apps using mysql, that they are building navigational databases."

Only if they put pointers in their tables. Which is a popular practice. Ruby On Rails for example does this by default.

Hank Williams said...

@anonymous.

By pointers - just to get the terminology right - do you mean foreign keys? I am not a rails guy so I don't know in detail how they do stuff.

cratuki said...
This post has been removed by the author.
cratuki said...

> Graphical user interfaces sucked!
[...]
> Of course now that just sounds
> stupid.

That's ridiculous!

{raises keyboard and grips above head}

From my cold, dead hands!!!

David Mathers said...

"By pointers - just to get the terminology right - do you mean foreign keys?"

No, foreign keys aren't part of the data model. They're a SQL integrity constraint.

In the same way you could write a constraint that says "reject email addresses that don't contain @ symbols because they're obviously invalid" you can use a foreign key to say "reject all orders that contain names of products we don't sell"

But no, foreign keys have nothing to do with the relational model as such. They're more of a "framework" issue.

David Mathers said...

Rails implements pointers by creating an attribute called "id" in every table and filling it with mysql generated sequence numbers.

Hank Williams said...

@David,

I know what a foreign key is. I think you are answering a question I didnt ask. I didnt ask "are foreign keys part of the relational model". I was asking *anonymous* what he was referring to when he said rails used pointers. Specifically he said "Only if they put pointers in their tables" I am not clear what the exact construct in rails he is referring to.

David Mathers said...

It looks like I posted my followup at the same moment you posted your response. I hope it answers your question.

Anonymous was me in that message. I just forgot to sign my name.

Hank Williams said...

David,

Thanks. I was amused that we seemed to have hit the enter key at exactly the same time.

Hank Williams said...

David,

I dont agree that just because something uses unique auto generated IDs that that makes the database navigational. As I would define navigational, there are no contstraints on connecting records in the database. In an RDBMS there are always such constraints. I dont believe it is possible to build a high performance general navigational database by writing code purely in any known implementation of SQL. We are actually using an SQL database as information stores, but all of the interesting stuff happens outside of the RDBMS and we dont even use any relations or joining. We just use the RDBMS because it is an optimized data store. The key for this kind of stuff is not worrying about normalization in the traditional RDBMS sense. Also intelligent sharding, caching, and parallelism across servers in a map/reduce-ish sort of way. Stonebraker hates MapReduce/BigTable and the ideas behind them for databases, and yet google is the *only* company to do any of this kind of stuff at scale. When operating at true scale no RDBMS - out of the box - would be able to, for example, handle Gmail. So our thinking is much more in line with Google's way of thinking than Stonebraker or the typical RDBMS folks.

Emil Eifrem said...

Very interesting discussion and I couldn't agree more on the flexibility of the graph model for most interesting new application areas. My company Neo Technology (still in stealth mode) is working on a commercial graph database system, which we offer as open source. It's available at: http://neo4j.org.

(Be warned though that documentation at that site is still is a bit sparse. It should improve over the course of the next month.)

We've built systems on top of this database for 7+ years now and are astounded at the flexibility of using a schema-less, arbitrarily interconnected key-value based graph as a persistence model. In particular, it maps a lot better to our cognitive model of most domains (we refer to this as "white-board friendly"), it allows for easier refactoring of the data model (schema evolution) and it has substantially (several orders of magnitude) better performance for certain operations that lead to joins in a relational database.

Feel free to check it out and join the discussions on our mailing list!

Hank: if you'd like to include this in your overview/review of alternative DB implementations, I'd be happy to answer any questions that our documentation may not provide for you.

Interesting thread!

Emil Eifrem
CEO Neo Technology
http://neotechnology.com

David Mathers said...

Ok, I understand what you mean by foreign keys now. So the answer to your question:

"By pointers - just to get the terminology right - do you mean foreign keys? "

Is yes. That's exactly what I mean. Relational databases don't have foreign keys at all in the sense that you are talking about. And yes, that's exactly how Rails works.

For instance in Rails (ActiveRecord) if you create a Cities table and a States table they will both get "id" columns (pointers) which will be filled with a mysql generated sequence. The Cities table will also get a "state_id" column (to store the state pointer).

Then you can do this:

my_city = City.find_by_name("San Francisco")

And then, since there are pointers in your tables, you can use navigation to get the state name:

my_state_name = my_city.state.name

The only way to get that data with a relational database would be to use a join.

That's all I have to say on the subject, so in closing I'll just say that if by "death of relational" you really mean "death of SQL" (which I think you do) then I say yes yes, it can't happen soon enough. SQL is relational COBOL and it hangs around our collective neck like an albatross. The death of SQL will mean the birth of relational.

David Mathers said...

"The only way to get that data with a relational database would be to use a join."

Ack, this sentence is technically wrong and I hope doesn't cause confusion. It wasn't a very good example. Pretend I said "state population" instead of "state name". Thanks.

darose said...

@David

For instance in Rails (ActiveRecord) if you create a Cities table and a States table they will both get "id" columns (pointers) which will be filled with a mysql generated sequence. The Cities table will also get a "state_id" column (to store the state pointer).

Then you can do this:

my_city = City.find_by_name("San Francisco")

And then, since there are pointers in your tables, you can use navigation to get the state name:

my_state_name = my_city.state.name



I'm not sure I understand: why do people think this approach (i.e., "pointers" in db records) is a problem?

Chris Date routinely rants about this as a problem but, near as I can tell, his whole argument against it is:

a) this isn't really "relational", and
b) the relational model "beat" the navigational model long ago, therefore the navigational model is bad

But ... WTF?!?!? What does any of that prove? This seems to me to be a quite intelligent and practical way to store data. And, frankly, how else would Date suggest it be stored? Does anyone know: how is one "supposed to" do something like this in a "truly relational" way?

David Mathers said...

@darose

"But ... WTF?!?!? What does any of that prove? This seems to me to be a quite intelligent and practical way to store data."

I agree. Chris Date and his people are relational supremacists. Here's a question for them:

If the relational model "beat" the navigational model then why are (as far as I know) all the filesystems on all the computers of the world still navigational and not relational?

I know that Microsoft is/was working on a relational filesystem that they had planned to ship with Windows Vista. I'm not sure what happened to it.

They are also hypocrites. They jump all over people who say things like "death of relational" because it's the equivalent of saying "death of logic". But they do this at the same time as saying "relational beat navigational", which is the equivalent of saying "logic beat algorithms".

In their fantasy world people should be able to interact with computers using pure logic and not have to use algorithms at all. Maybe in 20 years. Maybe never.

Tim said...

awesome article and conclusion. touchdown.

Greg Jorgensen said...

I think you are interpreting healthy skepticism as attacks on your ideas, or you personally. Our industry is always swirling with half-baked bullshit, so until someone does to the relational model what the GUI did to the command line or IDEs did to text editors I think you need to expect to face skepticism -- the "relational databases suck" meme is really old and so far all attempts to replace it have failed.

In other words don't make the mistake of conflating skeptics who are actually interested in what you have to say, with the naysayers with turf to protect you ridicule. My personal experience in this business is that skeptics greatly outnumber naysayers; the obvious fact that things do progress (e.g. your own examples) proves my point.

I'm sure you've noticed that Mac OSX is now a leading Unix distro, so it's open to debate which "side" actually won.

I do get humor but titling your post "Don't Tase Me Bro" is a little too provocative. You're introducing a tinge of racism and (the threat of) physical violence (directed at you) into a technical argument. Maybe I'm sensitive to it because I live in a city (Portland) that has had seen some nasty racist taser incidents but your headline hit a false note with me.

Greg Jorgensen said...

david mathers:

Your "proof" that database system must be either relational or imperative not only misstates the issue but misuses or redefines common terms. Relational databases are based on set theory and predicate logic. There's no such thing as a "relational" style of programming; object-oriented belongs in your list with functional and imperative. SQL is more like logic programming than any of the examples you gave.

No, foreign keys aren't part of the data model. They're a SQL integrity constraint.

and

... foreign keys have nothing to do with the relational model as such. They're more of a "framework" issue.

Foreign keys are very clearly part of the relational model. Explain how a database can conform to Codd's first and second rules without foreign keys.

Calling foreing keys an "integrity constraint" is misleading at best. Foreign keys can and commonly are defined in the data model without associating an integrity constraint with them (in violation of Codd's tenth rule I will point out, to be fair). Going back to an old example an invoice table's definition (model) would include the customer key. If the model also includes a constraint defining the invoice.customerid column as a foreign key the database engine will enforce the constraint. However the customerid column exists in the model (the invoice table) independent of the constraint. It's very common to delay adding such constraints to a database model while it is evolving (if you believe that database models can change -- not everyone does). The foreign key can still be used in joins without the constraint. This would be a minor academic quibble except that you build on it when discussing pointers.

What Rails does with required "id" columns in every table is called surrogate keys. A surrogate key is a meaningless column that only serves to give each tuple (row) a unique value. Surrogate keys can be foreign keys in other tables and used in joins. Surrogate keys are almost always a concession to application code (Rails) or a premature optimization, and that's why Chris Date et al. eschew them (though you can find examples of their proper use in his books).

In practice (as evidenced by Rails) surrogate keys (id columns) are used as pointers so application code -- the Rails ActiveRecord implementation -- can do what would otherwise be done with joins, assuming the underlying database was designed with database integrity in mind, rather than can-I-use-it-with-Rails in mind. Ruby on Rails has simplified (I would say dumbed down) database design because it needs to generate code to access the database, and it's very hard to generate correct (to say nothing of optimized) SQL queries on arbitrary databases.

By introducing the id columns and managing relationships in code Rails greatly simplifies accessing the database from Rails code. I think the cost of this simplification may be high and I am very skeptical about Rails applications and the databases they are built on scaling or maintaining integrity, but what Rails does is not all that horrible considering some of the other routine violations played out with RDBMSs.

Yes, I know Rails can be made to talk to what the Rails wingnuts call "legacy databases," which are defined as any database the Rails programmer can't add surrogate key "id" columns to. If you are looking for a kindred spirit in the fight to wrest control of databases away from DBAs and don't care what other, possibly non-Rails, applications will do with your database, David Heinemeier Hansson has written about it here (Choose a single layer of cleverness) and elsewhere. I think he's wrong about relational databases. He has since backed off and said he was talking about "application databases" (databases used by a single application), not "integration databases," though in the wider more experienced world of programmers who use databases that's a silly and dangerous distinction, unless you can predict all future applications that might access your database.

Rails may not be a complete throwback to navigable databases, but there's no question the Rails community led by DHH is espousing some pretty ignorant ideas about database design and writing code around relational databases.

Chris Date and his people are relational supremacists.

Not sure who "his people" are but I'll take the ad-hominem bait.

If the relational model "beat" the navigational model then why are (as far as I know) all the filesystems on all the computers of the world still navigational and not relational?

In fact why are all the toasters in the world not relational? You are not talking about the same thing Chris Date was talking about (the history of database management systems). I guess the short answers are (a) file systems are not data management systems except in a colloquial sense, and (b) file systems predate the relational model and haven't changed much for a variety of reasons, none of which Chris Date or "his people" have anything to do with.

You're mixing abstraction layers here. How do you think Oracle and MySQL and every other RDBMS physically stores data? Right -- in a file system. Codd's eighth rule says that a relational system must be independent of the physical storage system.

I know that Microsoft is/was working on a relational filesystem that they had planned to ship with Windows Vista. I'm not sure what happened to it.

I am -- they failed to get it working and cancelled the project so Vista could ship. As I explained already this is a cart before the horse problem -- relational databases are not physical storage systems.

They are also hypocrites. They jump all over people who say things like "death of relational" because it's the equivalent of saying "death of logic". But they do this at the same time as saying "relational beat navigational", which is the equivalent of saying "logic beat algorithms".

What "they" actually say is actual database management systems designed around the navigational (hierarchical, network/graph, what have you) models failed to meet technical and business needs when they had to compete with systems based on relational theory. That is just a plain fact of the market. The relational model can do everything the earlier systems could do while adding many significant advantages. Chris Date and Ted Codd and their "people" did not go around in their database supermacist outfits and blow up the other systems. Those tools died out because they couldn't compete with the relational model.

How anyone fails to see this as an obvious case of a better technology replacing what came before it -- in the same way Java replaced Cobol -- bewilders me. Is it really necessary to call people supremacists and imagine dark conspiracies of worthy technologies somehow fraudulently killed off?

If someone can actually show a technology that offers real advantages over the relational model without giving up data integrity and performance I and all of the rest of the RDBMS supremacists will be the first to pay attention. Until then it's just hand waving and smoke blowing and, in this discussion, a cargo cult of people who don't know very much about RDBMSs and so wear a big chip on their shoulder because they think something better is being kept from them by the relational high priests.

Greg Jorgensen said...

Hank:

I dont believe it is possible to build a high performance general navigational database by writing code purely in any known implementation of SQL.

You are probably right about that.

When operating at true scale no RDBMS - out of the box - would be able to, for example, handle Gmail.

That's true to a point, but it's not a problem of scaling. Every big bank processes more transactions in an hour than gmail processes email, and the databases banks use are built on commercial relational products. One of the reasons commercial RDBMSs have succeeded so well is that they do scale to huge databases, better than the systems they replaced.

I can easily see how sending email and tagging and organizing and all that can be done with an RDBMS -- I've worked on and written RDBMS-backed email systems. But Google is solving a different problem, searching message content, which is not a problem RDBMSs are well-suited for. Google's data structures to support their search algorithms are obviously more sophisticated than inverted indexes (which can be easily implemented in an RDBMS). They have developed their own data structures and techniques for solving their problems that may be superior to RDBMS-based solutions, but that doesn't mean RDBMSs are now obsolete, it just means they aren't the answer for every problem domain.

Google doesn't make any integrity, completeness, or performance guarantees, and that's an important point to remember when comparing Google's data management tools to RDBMSs. Oracle may not be well-suited for gmail, but by the same token map/reduce is not a great system for managing billions of financial transactions.

None of us "relational supremacists" would claim that the relational model suits every data management need. However the relational model does suit almost all of the data management needs encountered so far, which is why the relational model is ubiquitous.

You may be on to a problem domain that will not be well-served by relational technology. You may, like Google, develop an alternative technology that gives you a better fit than an RDBMS. Great -- nothing wrong with that. I use gmail myself. But none of that means that RDBMSs are dead. I use key/value pair databases a lot when my application doesn't need an RDBMS, but I don't imagine that approach making relational databases go away.

Hank Williams said...

greg,

" I think you are interpreting healthy skepticism as attacks on your ideas, or you personally."

Nope. I just think people who make irrational arguments about why things are "dangerous" or "can be done the old way" when a new way is simpler, are all in a similar category. I am not offended, but having seen it for years I do enjoy labeling it for what it is. You seem to fear being termed "unhealthy". I promise it is not personal.

"My personal experience in this business is that skeptics greatly outnumber naysayers; the obvious fact that things do progress (e.g. your own examples) proves my point."

huh? Logic please.


"I'm sure you've noticed that Mac OSX is now a leading Unix distro, so it's open to debate which "side" actually won."

Please re-read post. My point was the unix guys were against non-command line UIs. It is clear that graphical UIs have won. Therefore this statement makes no sense.

"I do get humor"

No, you dont.

" but titling your post "Don't Tase Me Bro" is a little too provocative. You're introducing a tinge of racism and (the threat of) physical violence (directed at you) into a technical argument."

Please google "don't tase me bro". It is a reference to a quote from a *white* protester being tased while at a john kerry (i believe) speech. It's been all over the news and youtube. Perhaps you have missed it. In fact you have the roles reversed. In my scenario the "naysayers" are the ones asking not to be tased. There is no racial angle here and I am not referencing the idea that someone might do me harm. I have no such thoughts or intent to raise race. Aside from your general "argument from authority" style which does tend to limit it's effectiveness I do not perceived anything particularly improper in your (or anyone else's) arguments.

Hank Williams said...

"That's true to a point, but it's not a problem of scaling. Every big bank processes more transactions in an hour than gmail processes email"

I would be *shocked* to find that this is so. What big bank has more or anywhere near the customers than Gmail has mailboxes. And how many transactions hit a typical bank account a day? Moreover, SQL databases scale vertically, not horizontally, which is a really big problem for web applications, which is really what my piece is about.

", and the databases banks use are built on commercial relational products."

But not mainstream hardware.

" One of the reasons commercial RDBMSs have succeeded so well is that they do scale to huge databases, better than the systems they replaced."

But of course I am in my article not referring to things that have been replaced despite your aggressive desire to create this false analogy.

"I can easily see how sending email and tagging and organizing and all that can be done with an RDBMS -- I've worked on and written RDBMS-backed email systems."

Yes, but you have not written a horizontally scalable RDBMS. Using any standard SQL database, you would have had to do a lot of work in this regard or go get some really big iron to scale.

" But Google is solving a different problem, searching message content, which is not a problem RDBMSs are well-suited for."

actually, bigTable is not at all about solving the full text search. BigTable is about solving the write scaling bottleneck.

" They have developed their own data structures and techniques for solving their problems that may be superior to RDBMS-based solutions, but that doesn't mean RDBMSs are now obsolete, it just means they aren't the answer for every problem domain."

Obviously (or perhaps not) I am not suggesting that RDBMS's will go away. They have a valid place in the data management universe. The real point of the article is that web apps are not a good match for relational databases, and that is where all the development growth is. You dont like activeRecord, but the fact that it is so successful is a reflection of the fact that people are yearning for abstractions because the SQL model is not well suited to these developers problems. If activeRecord weren't needed it wouldnt be so successful.

Greg Jorgensen said...

Good luck with whatever it is you're working on, Hank. I'm all for a spirited argument but this is just a bully pulpit. When people take the time to read and comment and bookmark your site you might want to think twice before calling them ignorant, irrational, humorless, arrogant, passive aggressive, inexperienced, etc. That's no way treat an audience or to get anyone other than sycophants to pay attention.

I did misunderstand the taser reference and for that I apologize and stand corrected. I stand by everything else I've written here.

Over and out.

darose said...

Greg,

I have quite a number of questions about things you've written here:


What "they" actually say is actual database management systems designed around the navigational (hierarchical, network/graph, what have you) models failed to meet technical and business needs when they had to compete with systems based on relational theory. That is just a plain fact of the market. The relational model can do everything the earlier systems could do while adding many significant advantages. Chris Date and Ted Codd and their "people" did not go around in their database supermacist outfits and blow up the other systems. Those tools died out because they couldn't compete with the relational model.

How anyone fails to see this as an obvious case of a better technology replacing what came before it -- in the same way Java replaced Cobol -- bewilders me.


I think one of the problems here is those assertions above. I have absolutely no evidence to verify them!

Now I'll freely admit that I'm not old enough to recall hiearchical DB's. And although I know from historical record that relational DB's replaced them in the market, that's as far as I can take this logic without some additional supporting facts.

So can you supply any? Why - exactly, and specifically - do you think relational DB's were superior? What - specifically - did they do better than hierarchicals?

Frankly, I think a good part of the interest in hierarchical DB's is because few (if any!) developers working with DB's today can even remember why they're supposed to be "worse"! So that - coupled with the fact that the vocal proponents of RDBMS's don't seem to be providing any clear explanation of *how* they're better *in any practical sense* - is how people can "fail to see it".



If someone can actually show a technology that offers real advantages over the relational model without giving up data integrity and performance I and all of the rest of the RDBMS supremacists will be the first to pay attention.


Now here, I think you're posing the challenge with such a unique set of constraints that only a RDBMS could possibly satisfy all of them!

I assume you're well aware of the numerous new database product/projects that have come out in recent years, such as Google BigTable, CouchDB, SimpleDB, ThruDB, etc.

I think it's clear that they can offer several advantages over SQL databases - work more easily with sparse data, work more easily with hierarchical data, etc.

Do they give up performance? Generally not, frankly.

Do they give up data integrity? Sometimes they do. But so what? Every technology has things that it does and doesn't do. Every technology has its strengths and weaknesses. A new DB system can be stronger in certain areas than a SQL database and weaker in others and still be a completely viable and useful tool. Drawing a line in the sand that the product must contain every single feature of a SQL database including data integrity, consistency, etc. before you can see the value in it is a very narrow-minded viewpoint, IMO.


Until then it's just hand waving and smoke blowing and, in this discussion, a cargo cult of people who don't know very much about RDBMSs and so wear a big chip on their shoulder because they think something better is being kept from them by the relational high priests.


Again, I think you're missing what's really spurring this on. People aren't just arbitrarily deciding to rail against RDBMS's.

* Most developers don't care about "relational calculus". They just want a tool that can let their program store and then re-find data.

* They want this tool to be easy to use. SQL queries, frankly, get far more complex than they ought to be once you start dealing with data of any complexity. People want a less complex way to interact with their database. They need something better than SQL. And *some*one is going to give it to them. Whoever it is stands to garner a great deal of success. And rightly so! If someone can make a product that does everything an existing product does, but is easier to use, they the old product deserves to die!

* Do developers not really "understand" RDBMS's in the Edgar Codd and Chris Date and Fabian Pascal sense? Probably so. (I know I don't. I've been reading up on this stuff and I still can't see how whether or not something is "truly relational" makes any *practical* difference.) But so what? Again, people just want to save and load their data. They don't want to have to have an advanced degree to interact with a database - and they're right. They shouldn't have to.


What's needed (IMO) is a database that's simpler to interact with than SQL. If that DB winds up not being "purely relational" but still achieves that goal then I'll applaud it. I think it'll be a big step forward - and one that's long overdue.

I think, frankly, as Hank opined in another post, some of the people who strongly defend the relational model do so in a very dogmatic way (relational is right; everything else is wrong) without ever taking the time to explain in detail any supporting facts that demonstrate how or why this is the case.

If relational is really "right", then I'm sure that proponents such as yourself will have no problem coming up with examples to demonstrate that.

Until I see that though (and particularly after having recently read through the rantings on Fabian Pascal's site), I remain of the opinion that relational DB's are but one approach - a decidedly very good approach, no question, but nevertheless one with some key weaknesses - and that I'm open to other approaches that might work better. It's this opinion that's guiding me in my own database development project.

Would love to hear any convincing arguments you might have otherwise.

David Mathers said...

@darose

"Most developers don't care about 'relational calculus'. They just want a tool that can let their program store and then re-find data."

Beneath all the mumbo jumbo it really is this simple: navigational means finding your data using some form of system id, relational means finding your data using a predicate. Everything else is secondary.

Here's a thought experiment to illustrate the tradeoffs: imagine you're at the police station looking at a lineup of 6 guys and the police want you to tell them who the bad guy is.

The problem with using a predicate is that it takes some effort to create a predicate that uniquely identifies your data: "the 6 foot 2 guy with green eyes"

If the police create pointers for you by having each man hold an id number then all you have to do is say "number 4", which seems easier and more straightforward.

The problem with pointers is that they have to be managed. If the cops write down "number 4" and then the guys in the lineup trade numbers, then the wrong guy goes to jail. If the cops write down "the 6 foot 2 guy with green eyes" then the right guy will always go to jail.

The main arguments against pointers are:

1. The kind of pointer confusion I illustrated happens all the time in the real world. Like really, all the time. Then you have to sift through your data and figure out "which guy did #4 refer to when I wrote down #4?"

2. There's nothing you can do with pointers that you can't do with a predicate. I read about a town in South America that stopped using mailing addresses (pointers) and to send a letter there you have to write a predicate on the envelope.

The main arguments against relational are:

1. Finding the correct predicates can be difficult and/or tedious (remember, you have to create predicates both to store and retrieve data, whereas to store data with navigational you just lump together some data, give it an id, and call it a day).

2. It requires more computational power. Imagine trying to scale the predicate postal service to New York City. That would require some serious brain power. Chris Date and friends do a lot of hand waving here about how it's just an implementation detail, as if to imply that the engineers who create Oracle are idiots who aren't making a full featured relational engine because they can't figure out how.

Hank Williams said...

@ David Mathers

"Beneath all the mumbo jumbo it really is this simple: navigational means finding your data using some form of system id, relational means finding your data using a predicate. Everything else is secondary."

This is a very neat description. I like it, and yet at the same time it feels a little off. I think you can use what one might call a navigational or a predicate based system to find data in any type of database. For me the distinction is more about how the data is structured and how the relations are built, than about how you search although the structure of the relationships will obviously drive the search. The other thing that is interesting is the suggestion that some make that the only way to "describe" data using a predicates is the relational model, which is ridiculous.

In any case, I love this statement:

"Chris Date and friends do a lot of hand waving here about how it's just an implementation detail, as if to imply that the engineers who create Oracle are idiots who aren't making a full featured relational engine because they can't figure out how."

I would also add that they totally ignore the complexity of expression issue which you point out very well.

David Mathers said...

@Hank

I was just in the middle of posting a followup. Here it is:

--

This is technically a mistake:

"Beneath all the mumbo jumbo it really is this simple: navigational means finding your data using some form of system id, relational means finding your data using a predicate. Everything else is secondary."

Replace "finding" with "identifying".

You can find your data with search or indexes or whatever. Navigational vs Relational is about how you give identity to your data.

David Mathers said...

Oh! I forgot the other big problem with navigational: data access is mixed into the application, it's not independent.

With relational you have a clean predicate logic based interface to your data. Predicate logic will never change.

With navigational the data access code is part of the application, and we all know how fickle application technology is. The technology you use today might be obsolete in 5 years. How will you access your data then? By rewriting the application.

ActiveRecord is a perfect example. If you use it then you will create a database that's effectively useless without ActiveRecord.

darose said...

@David Mathers:

Have been a bit swamped work-wise, and so bookmarked your last couple of comments to read when I had more time (and less accumulated blog reading piled up). Just finally got to them today, and I'm glad I came back. You posted very good commentary that highlights the issues very simply and clearly, and definitely helps my understanding of the crux of the issue. Thanks much for taking the time to share that.

David Mathers said...

@darose:

I'm happy that my explanation helped!

David Mathers said...

@hank

This is only a hunch, but it occured to me that you might be conflating "the relational model of data storage and retrieval" presented by Ted Codd in 1969 with "entity/relationship data modeling" presented by Peter Chen in 1976.

The relational model has nothing to say about entites or objects or relationships. It is at a lower level than that. It is only concerned with data access. Specifically it uses the ideas of "subject", "predicate", "domain" (aka type) and "relation" (aka table).

E/R modeling is hugely popular and is often taught as "how to use the RDBMS" or something. But like any high level technique it will make some types of jobs easier while making other types of jobs much harder or impossible.

Relational purists tend to be suspicious of data modeling in general and think E/R modeling in particular is a bad idea.

Don't throw the relational baby out with the E/R modeling bathwater!

Post a Comment