adv

Hard truths about the NoSQL revolution

The NoSQL buzzword has been metastasizing for several years. The excitement about these fast data stores has been intoxicating, and we're as guilty as anyone of seeing the groundbreaking appeal of NoSQL. Yet the honeymoon is coming to an end, and it's time to start balancing our enthusiasm with some gimlet-eyed hard truths.

Don't get us wrong. We're still running to try the latest experiment in building a simple mechanism for storing data. We still find deep value in MongoDB, CouchDB, Cassandra, Riak, and other NoSQL standouts. We're still planning on tossing some of our most trusted data into these stacks of code because they're growing better and more battle-tested each day.

But we're starting to feel the chafing, as the NoSQL systems are far from a perfect fit and often rub the wrong way. The smartest developers knew this from the beginning. They didn't burn the SQL manuals and send nastygrams to the sales force of their once devoted SQL vendor. No, the smart NoSQL developers simply noted that NoSQL stood for "Not Only SQL." If the masses misinterpreted the acronym, that was their problem.

This list of gripes, big and small, is thus an attempt to document this fact and to clear the air. It's meant to set things straight now so that we can do a better job understanding the trade-offs and the compromises.

NoSQL hard truth No. 1: JOINs mean consistency
One of the first gripes people have about SQL systems is the computational cost of executing a JOIN between two tables. The idea is to store the data in one and only one place. If you're keeping a list of customers, you put their street addresses in one table and use their customer IDs in every other table. When you pull the data, the JOIN connects the IDs with the addresses and everything remains consistent.

The trouble is that JOINs can be expensive, and some DBAs have concocted complex JOIN commands that boggle the mind, turning even the fastest hardware to sludge. It was no surprise that the NoSQL developers turned their lack of JOINs into a feature: Let's just keep the customer's address in the same table as everything else! The NoSQL way is to store key-value pairs for each person. When the time comes, you retrieve them all.

Alas, people who want their tables to be consistent still need JOINs. Once you start storing customers' addresses with everything else about them, you often end up with multiple copies of those addresses in each table. And when you have multiple copies, you need to update them all at the same time. Sometimes that works, but when it doesn't, NoSQL isn't ready to help with transactions.

Wait, you say, why not have a separate table with the customer's information? That way there will only be one record to change. It's a great idea, but now you get to write the JOIN yourself in your own logic.



Comments