Relational DBMS vs Document Oriented Key-Value Stores?!?!

5 06 2010

In the recent days document databases get big buzz from all different areas: programmers who search for freedom or VCs who search for the next big buck. But there is a big disadvantage: the bigger the noise is the less the people using buzzwords like cloud or key-value store, understand whats behind. So let me start with a short disclaimer: I am not against key-value stores or any other document oriented database!

But nonetheless I think it is required to clarify what a standard relational database can do, especially how powerful a language like SQL is. So lets clarify this by some examples and let me give you my take on the buzz. To make it more entertaining consider this a small Q&A session with a key-value store guy and me as the defender of relational DBMS.

“When I’m working with documents I just dump everything into the object and can map-reduce everything! SQL cant do this.”
Sure you can dump anything in an object, but SQL has a different heritage: SQL is a declarative language that is designed to express the What instead of the How. If you want to express algorithms directly on the database use stored procedures!

“But stored procedures are bad, because they are not portable!”
Ah get that, but how about your custom map-reduce code? Writing custom code for querying data is nothing more than using the data store API directly and is a stored procedure.

“In SQL I can’t create my own functions, I’m limited to MIN,MAX,AVG. That’s so bad!”
Whew, I hope that you did not stop reading about SQL in ’89 or ’92. The SQL standard provides everything you need to define user-defined functions and custom functions. Any major DBMS allows you to even write them in any programming language you can program.

“You know, I want to be flexible, stream content of different types. SQL cant do this!”
Maybe you are missing a point here that is called normalization. Yes, SQL is restricted to flat tables but you can use normalization and queries to create any list of your content.

“But SQL is so slow, every time I create a join I’m lost. Look, XXX-DB is so fast using static indices once I created a map function!”
Dude, using and index to increase query performance is as old as I am. You want indices, go create them! If you don’t know what queries you use, you wrote a bad application. For any other case you might go for index optimization techniques well researched in the last decades.

“But it runs in the cloud! This makes it even more faster and scalable.”
Gnargh, I’m pretty sure you know about all this parallel database research going on since — decades. Just because it is a key-value store does not make it scalable and faster. Please dig deeper.

To conclude, please understand me I’m not against key-value stores or document databases, they have their very specific terrain — as do standard relational databases. So what might be a good example for a key-value store document oriented application? When looking at the characteristics of such databases one thing that comes to my mind immediately is the following: Since every object is stored by a unique key all those applications benefit where the workload is almost only single object lookup and no aggregations take place. Since hash functions that are used to build the indices for the keys are ideal for partitioning scale-out can be implemented easier. Another important fact is the ability for semi-structured data, here document oriented storage system clearly win over relational systems.

The only thing I wish from people propagating document oriented storage is that they do not only talk about it because its cool but because they can sketch a valid use case. If you feel like you don’t know enough about the possibilities of standard relational databases talk to people who do or read something about it.

Did I offend you? Challenge me, let’s discuss!
– Martin




One response

5 06 2010

I have been thinking about the same issue lately and think that a lot of the resentment with SQL derives from the issue of optimizing it.

Sure, throw a few indexes against your table – the database of your choice still has to pick your index. Your database will usually tell you WHAT it is doing (as per EXPLAIN) but not WHY, which might help you to better understand the quirks of your current design OR why the database decided to do something different. As one can see at the PICASSO project ( query optimizers heavily differ in behavior depending on your query — that might or might not be what you want. Especially when dealing with a lot of data, there might be the case that you very clearly know how a certain operation should be done efficiently. Maybe people feel that NoSQL databases give them back the ability to really influence/improve/screw their application’s performance as they won’t have to rely on some “obscure” optimization their database is doing for them.

On the other end, I guess that most of the NoSQL-DBMSes are just getting around a hard part of a SQL database system: Parsing and optimization. Most of them could surely provide a (limited, if not complete) SQL interface that represents what they are doing, but they chose not to and make that the IMO worst possible differentiation to other DBMS, instead of focusing on the underlying storage.

Although I guess a lot of them just have been spoiled by MySQL, the kid that fell out of the stupid tree when god was handing out brains and ended up last in line 😉

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: