Relational DBMS vs Document Oriented Key-Value Stores?!?!

5 06 2010

In the recent days document databases get big buzz from all different areas: programmers who search for freedom or VCs who search for the next big buck. But there is a big disadvantage: the bigger the noise is the less the people using buzzwords like cloud or key-value store, understand whats behind. So let me start with a short disclaimer: I am not against key-value stores or any other document oriented database!

But nonetheless I think it is required to clarify what a standard relational database can do, especially how powerful a language like SQL is. So lets clarify this by some examples and let me give you my take on the buzz. To make it more entertaining consider this a small Q&A session with a key-value store guy and me as the defender of relational DBMS.

“When I’m working with documents I just dump everything into the object and can map-reduce everything! SQL cant do this.”
Sure you can dump anything in an object, but SQL has a different heritage: SQL is a declarative language that is designed to express the What instead of the How. If you want to express algorithms directly on the database use stored procedures!

“But stored procedures are bad, because they are not portable!”
Ah get that, but how about your custom map-reduce code? Writing custom code for querying data is nothing more than using the data store API directly and is a stored procedure.

“In SQL I can’t create my own functions, I’m limited to MIN,MAX,AVG. That’s so bad!”
Whew, I hope that you did not stop reading about SQL in ’89 or ’92. The SQL standard provides everything you need to define user-defined functions and custom functions. Any major DBMS allows you to even write them in any programming language you can program.

“You know, I want to be flexible, stream content of different types. SQL cant do this!”
Maybe you are missing a point here that is called normalization. Yes, SQL is restricted to flat tables but you can use normalization and queries to create any list of your content.

“But SQL is so slow, every time I create a join I’m lost. Look, XXX-DB is so fast using static indices once I created a map function!”
Dude, using and index to increase query performance is as old as I am. You want indices, go create them! If you don’t know what queries you use, you wrote a bad application. For any other case you might go for index optimization techniques well researched in the last decades.

“But it runs in the cloud! This makes it even more faster and scalable.”
Gnargh, I’m pretty sure you know about all this parallel database research going on since — decades. Just because it is a key-value store does not make it scalable and faster. Please dig deeper.

To conclude, please understand me I’m not against key-value stores or document databases, they have their very specific terrain — as do standard relational databases. So what might be a good example for a key-value store document oriented application? When looking at the characteristics of such databases one thing that comes to my mind immediately is the following: Since every object is stored by a unique key all those applications benefit where the workload is almost only single object lookup and no aggregations take place. Since hash functions that are used to build the indices for the keys are ideal for partitioning scale-out can be implemented easier. Another important fact is the ability for semi-structured data, here document oriented storage system clearly win over relational systems.

The only thing I wish from people propagating document oriented storage is that they do not only talk about it because its cool but because they can sketch a valid use case. If you feel like you don’t know enough about the possibilities of standard relational databases talk to people who do or read something about it.

Did I offend you? Challenge me, let’s discuss!
– Martin