2014/06/04

Linux! Champion of Big Data

Big data solutions based on distributed databases such as MongoDB (and Hadoop and others) rely on have very many nodes running in parallel to provide resiliency, performance and scalability.

This is a step up from the "cluster of 2-nodes" model (primary & failover) used for many legacy SQL installations. Such is simply not big enough to support resiliency with the sort of distributed database model NoSQL solutions provide (even if it could scale). For example, you'll need a minimum of x3 nodes just to allow the election of a primary to work in a replicated cluster and more for sharding using MongoDB.

Of course there's a reason why you've chosen a NoSQL solution in the first place - scale - and the choice of horizontal v vertical scaling at these sizes makes sense. This is all good news for Linux since an increase in the number of nodes has costs associated with it which I will likely dictate that Linux will become the OS of choice for such solutions instead of Windows or other UNIX OS's. Commodity hardware will likely be the same for all OS's (bar UNIX's) so the differentiator will be the OS (on price at least).

Of course, if your volumes are low then you can always stick with a SQL database - tried, tested and actually pretty damn good and suited to most problems out there. In many cases SQL should be the default. NoSQL if you're forced to by capacity requirements...

No comments:

Post a Comment

Voyaging dwarves riding phantom eagles

It's been said before... the only two difficult things in computing are naming things and cache invalidation... or naming things and som...