Scaling on a budget

Pre-cloud era. You have a decision to make. Do you define your capacity and performance requirements in the belief that you’ll build the next top 1000 web-site in the world or start out with the view that you’ll likely build a dud which will be lucky to get more than a handful of visits each day?

If the former then you’ll need to build your own data-centres (redundant globally distributed data-centres). If the latter then you may as well climb into your grave before you start. But most likely you’ll go for something in the middle, or rather at the lower end, something which you can afford.

The problem comes when your site becomes popular. Worse still, when that popularity is temporary. In most cases you’ll suffer something like a slashdot effect for a day or so which will knock you out temporarily but could trash your image permanently. If you started at the higher end then your problems have probably become terminal (at least financially) already.

It’s a dilemma that every new web-site needs to address.

Post-cloud era. You have a choice – IaaS or PaaS? If you go with infrastructure then you can possibly scale out horizontally by adding more servers when needed. This though is relatively slow to provision* since you need to spin up a new server, install your applications and components, add it to the cluster, configure load-balancing, DNS resiliency and so on. Vertical scaling may be quicker but provides limited additional headroom. And this assumes you designed the application to scale in the first place – if you didn’t then chances are probably 1 in 10 that you’ll get lucky. On the up side, the IaaS solution gives you the flexibility to do-your-own-thing and your existing legacy applications have a good chance they can be made to run in the cloud this way (everything is relative of course).

If you go with PaaS then you’re leveraging (in theory) a platform which has been designed to scale but which constrains your solution design in doing so. Your existing applications have little chance they’ll run off-the-shelf (actually, no chance at all really) though if you’re lucky some of your libraries may (may!) work depending on compatibility (Google App Engine for Java, Microsoft Azure for .NET for example). The transition is more painful with PaaS but where you gain is in highly elastic scalability at low cost because it’s designed into the framework.

IaaS is great (this site runs on it), is flexible with minimal constraints, low cost and can be provisioned quickly (compared to the pre-cloud world).

PaaS provides a more limited set of capabilities at a low price point and constrains how applications can be built so that they scale and co-host with other users applications (introducing multi-tenancy issues).

A mix of these options probably provides the best solution overall depending on individual component requirements and other NFRs (security for example).

Anyway, it traverses the rats maze of my mind today due to relevance in the news… Many Government web-sites have pitiful visitor numbers until they get slashdotted or are placed at #1 on the BBC website – something which happens quite regularly though most of the time the sites get very little traffic – peaky. Todays victim is the Get Safe Online site which collapsed under load – probably as result of the BBC advertising it. For such sites perhaps PaaS is the way forward.

* I can’t really believe I’m calling IaaS “slow” given provisioning can be measured in the minutes and hours when previously you’d be talking days, weeks and likely months…

Linux! Champion of Big Data

Big data solutions based on distributed databases such as MongoDB (and Hadoop and others) rely on have very many nodes running in parallel to provide resiliency, performance and scalability.

This is a step up from the “cluster of 2-nodes” model (primary & failover) used for many legacy SQL installations. Such is simply not big enough to support resiliency with the sort of distributed database model NoSQL solutions provide (even if it could scale). For example, you’ll need a minimum of x3 nodes just to allow the election of a primary to work in a replicated cluster and more for sharding using MongoDB.

Of course there’s a reason why you’ve chosen a NoSQL solution in the first place – scale – and the choice of horizontal v vertical scaling at these sizes makes sense. This is all good news for Linux since an increase in the number of nodes has costs associated with it which I will likely dictate that Linux will become the OS of choice for such solutions instead of Windows or other UNIX OS’s. Commodity hardware will likely be the same for all OS’s (bar UNIX’s) so the differentiator will be the OS (on price at least).

Of course, if your volumes are low then you can always stick with a SQL database – tried, tested and actually pretty damn good and suited to most problems out there. In many cases SQL should be the default. NoSQL if you’re forced to by capacity requirements…

UK’s security branch says Ubuntu most secure end-user OS (maybe)

Kind of late I know but I’ve recently completed a new desktop rollout project for a UK gov department to Windows 7 and found it interesting that CESG supposedly (see below) think that Ubuntu 12.04 is the most secure end-user OS. There was much discussion on this project around the security features and CESG compliance so I find this topic quite interesting.

They didn’t look at a wide range of client devices so other Linux distributions may prove just as secure, as could OSX which seems a notable omission to me considering they included ChromeBooks in the list. It was also pointed out that the disk encryption and VPN solutions haven’t been independently verified and they’re certainly not CAPS approved; but then again, neither is Microsofts BitLocker solution.

The original page under gov.uk seems to have disappeared (likely as result of all the recent change going on there) but there’s a lot on that site which covers end user device security including articles on Ubuntu 12.04 and Windows 7.

However, reading these two articles you don’t get the view that Ubuntu is more secure than Windows – in fact, quite the opposite. There’s a raft of significant risks associated with Ubuntu (well, seven) whilst only one significant risk is associated with Windows (VPN). Some of the Ubuntu issues look a little odd to me; such as users can ignore cert warnings since this is more a browser issue than OS related unless I’ve misunderstood as the context isn’t very clear, but the basic features are there, just not certified to any significant degree. This is and easy argument for the proprietary solution provides to make and a deal clincher for anyone in government not looking to take risks (most of them). I doubt open-source solutions are really any less secure than these but they do need to get things verified if they’re to stand up to these challenges. Governments around the world can have a huge impact on the market and use of open standards and solutions so helping them make the right decisions seems a no-brainer to me. JFDI guys…

Otherwise, the article does have a good list of the sort of requirements to look out for in end-user devices with respect to security which I reproduce here for my own future use:

  • Virtual Private Network (VPN)
  • Disk Encryption
  • Authentication
  • Secure Boot
  • Platform Integrity and Application Sandboxing
  • Application Whitelisting
  • Malicious Code Detection and Prevention
  • Security Policy Enforcement
  • External Interface Protection
  • Device Update Policy
  • Event Collection for Enterprise Analysis
  • Incident Response