Bluelock Blog

A Cloudy Future for Relational Databases

June 1, 2010 by John Ellis
Entity Relationship Model I remember quite vividly IBM's competition for SQL compliance on their AS/400 platform. 20-some years ago, databases had to be relational, tying together a vast sea of disparate columns. Relations between tables enforced a kind of consistency and normalization. No more brute-forcing random data into your corporate accounting system... now you had to obey the rules!

...or so the thinking went at the time.

Slowly, deep in the seedy database underground, seditious computer scientists sat stewing. They waited for the day when engineers realized that sometimes the process of normalizing data mutated it past the point of recognition. They knew one day some devious developer would see that relationships were too computationally expensive and slow. And one day... ah yes, one day... people would give up their crazy ad-hoc "Standard Query Languages."

While these computer scientists and software engineers were shoved to the margins by enterprise computing a few small companies took note of how well these rogue database systems scaled to the millions of users and petabytes of data. Lilliputian firms such as "Google," "LinkedIn" and "Facebook" started to lead a No-SQL revolution, running contrary to the dominant relational databases and instead storing mind-boggling amounts of data in non-relational tables and retrieving them faster than RDBMS' one-hundredth of their size.

Non-relational databases have become incredibly effective, especially when backed by a scalable pool of resources of a cloud computing provider such as BlueLock. If one takes a look at Redis - a powerful key-value store that can scale to a massive size - such a sense of scale quickly becomes apparent. By removing constraints one can get rid of building a huge number of indexes and instead deal out content quickly and efficiently. Craigslist has already leveraged Redis to an exceptional amount, and VMware sees quite a future in it as a platform as well.

If we take a step beyond we can see an entire landscape emerging: key-value stores such as Redis, Voldemort or Cassandra, hierarchical stores such as Zookeeper and tuple stores provided by JavaSpaces and Apache River. The number of choices seems to grow every day, and without a farm of servers it becomes quite a daunting task to evaluate which one fits your project best.

My recommendation is to take a step back and see which solution best fits the problem you are working within. Re-evaluate your needs and objectively ask yourself:
  • What business or logic problem am I really trying to solve?
  • How large is this data going to scale within a year? Are we talking about megabytes or petabytes?
  • How fast does the data need to be retrieved?
  • Do I really need to perform a bunch of ad-hoc queries? Or am I just looking up values based on their primary key?
  • Which solution is easiest to deal with? Which makes the most sense to me?
  • Do I need relational data? Do I need hierarchical data? Do I even care?

Once you build a matrix comparing each solution you will find some implementations quickly sink to the bottom and others become very tempting choices. Once you have determined a top list of possibilities, it is best to fire up a data store and write a few quick proof-of-concept test applications. A convenient way to do this is to login to your BlueLock vCloud Express account, spin up several virtual machines and load up an array of Linux boxes to test each solution out. Measure how easily the product can be installed and test how easily it can be scaled to multiple servers. Do some performance testing against sample applications on your own fenced network and watch your local resource utilization.

Very soon after you use your vCloud Express account to test the top candidates you should be able to feel one or two "fit" in a much more natural way than other solutions. For example, Zookeeper may be the natural fit for someone wanting to house a slew of centralized configuration data. At this point you can take the next step and test this alongside your web applications and judge more accurately the level of effort to get things running.

If at the end of this arduous process you still can't decide between a couple of top candidates do what I always do: pick the project with the best mascot. You simply can't go wrong.

Don't forget - once you select a data store implementation you can have your own scalable, elastic cloud to grow into. BlueLock can not only help you horizontally scale your data tier, BlueLock can also help design server layouts that best fit the sometimes eclectic world of non-relational databases. Whether it be heaps of disk or mountains of RAM to remain resident within, the BlueLock Enterprise Cloud can help your cabal of data power the next big thing.

Comments for A Cloudy Future for Relational Databases

blog comments powered by Disqus