Skip to main content

Linux! Champion of Big Data

Big data solutions based on distributed databases such as MongoDB (and Hadoop and others) rely on have very many nodes running in parallel to provide resiliency, performance and scalability.

This is a step up from the "cluster of 2-nodes" model (primary & failover) used for many legacy SQL installations. Such is simply not big enough to support resiliency with the sort of distributed database model NoSQL solutions provide (even if it could scale). For example, you'll need a minimum of x3 nodes just to allow the election of a primary to work in a replicated cluster and more for sharding using MongoDB.

Of course there's a reason why you've chosen a NoSQL solution in the first place - scale - and the choice of horizontal v vertical scaling at these sizes makes sense. This is all good news for Linux since an increase in the number of nodes has costs associated with it which I will likely dictate that Linux will become the OS of choice for such solutions instead of Windows or other UNIX OS's. Commodity hardware will likely be the same for all OS's (bar UNIX's) so the differentiator will be the OS (on price at least).

Of course, if your volumes are low then you can always stick with a SQL database - tried, tested and actually pretty damn good and suited to most problems out there. In many cases SQL should be the default. NoSQL if you're forced to by capacity requirements...

Comments

Popular posts from this blog

An Observation

Much has changed in the past few years, hell, much has changed in the past few weeks, but that’s another story... and I’ve found a little time on my hands in which to tidy things up. The world of non-functionals has never been so important and yet remains irritatingly ignored by so many - in particular by product owners who seem to think NFRs are nothing more than a tech concern. So if your fancy new product collapses when you get get too many users, is that ok? It’s fair that the engineering team should be asking “how many users are we going to get?”,   or “how many failures can we tolerate?” but the only person who can really answer those questions is the product owner.   The dumb answer to these sort of question is “lots!”, or “none!” because at that point you’ve given carte-blanche to the engineering team to over engineer... and that most likely means it’ll take a hell of a lot longer to deliver and/or cost a hell of a lot more to run. The dumb answer is also “only a couple” and “

Inter-microservice Integrity

A central issue in a microservices environment is how to maintain transactional integrity between services. The scenario is fairly simple. Service A performs some operation which persists data and at the same time raises an event or notifies service B of this action. There's a couple of failure scenarios that raise a problem. Firstly, service B could be unavailable. Does service A rollback or unpick the transaction? What if it's already been committed in A? Do you notify the service consumer of a failure and trigger what could be a cascading failure across the entire service network? Or do you accept long term inconsistency between A & B? Secondly, if service B is available but you don't commit in service A before raising the event then you've told B about something that's not committed... What happens if you then try to commit in A and find you can't? Do you now need to have compensating transactions to tell service B "oops, ignore that previous messag

Equifax Data Breach Due to Failure to Install Patches

"the Equifax data compromise was due to their failure to install the security updates provided in a timely manner." Source: MEDIA ALERT: The Apache Software Foundation Confirms Equifax Data Breach Due to Failure to Install Patches Provided for Apache® Struts™ Exploit : The Apache Software Foundation Blog As simple as that apparently. Keep up to date with patching.