Scaling the Turd

It has been quite some time since my last post… Mainly because I’ve spent an inordinate amount of time trying to get an application scaling and performing as needed. We’ve done it, but I’m not happy.

Not happy, in part because of the time its taken, but mainly because the solution is totally unsatisfactory. It’s an off the shelf (COTS) package so making changes beyond a few “customisations” is out of the question and the supplier has been unwilling to accept the problem is within the product and instead points to our “environment” and “customisations” – of which, IMHO, neither are particularly odd.

At root there are really two problems.

One – a single JVM can only handle 10 tps according to the supplier (transaction/page requests/second). Our requirement is around 50.

Two – performance degrades over time to unacceptable levels if a JVM is stressed hard. So 10tps realistically becomes more like 1 to 2 tps after a couple of days of soak testing.

So we’ve done a lot of testing – stupid amounts of testing! Over and over, tweaking this and that, changing connection and thread pools, JVM settings, memory allocations etc. with pretty much no luck. We’ve checked the web-servers, the database, the queue configuration (itself an abomination of a setup), the CPU is idle, memory is plentiful, garbage-collection working a treat, disk IO is non-existent, network-IO measured in the Kb/sec. Nada! Except silence from the supplier…

And then we’ve taken thread dumps and can see stuck threads and lock contention so we know roughly where the problem lies, passed this to the supplier, but still, silence…

Well, not quite silence. They finally offered that “other customers don’t have these issues” and “other customers typically run 20+ JVMs”! Excuse me? 20 JVMs is typical..? wtf!? So really they’re admitting that the application doesn’t scale within a JVM. That it cannot make use of resources efficiently within a JVM and that the only way to make it work is to throw more JVMs at it. Sounds to me like a locking issue in the application – one that no doubt gets worse as the application is stressed. Well at least we have a fix…

This means that we’ve ended up with 30 JVMs across 10 servers (VMs) for one component to handle a pathetic 50tps! – something I would expect 2 or 3 servers to handle quite easily given the nature of the application (the content delivery aspect of a content management system). And the same problem pervades the applications other components so we end up with 50 servers (all VMs bar a physical DB cluster) for an application handling 50 tps… This is not efficient or cost effective.

There are also many other issues with the application including such idiocies as synchronous queueing, a total lack of cache headers (resulting in a stupid high hit-rate for static resources) and really badly implemented Lucene indexing (closing and opening indexes continually). It is, by some margin, the worst COTS application I have had the misfortunate to come across (I’ll admit I’ve seen worse home-grown ones so not sure what that leaves us in the buy-v-build argument…).

So what’s wrong with having so many JVMs?

Well, cost for a start. Even though we can cram more JVMs onto fewer VMs we need to step this up in chunks of RAM required per JVM (around 4GB). So, whilst I’m not concerned about CPU, a 20GB 4vCPU host can really only support 4 JVMs (some space is needed for OS and other process overheads). Lots of tin, doing nothing.

But the real issue is maintenance. How the hell do you manage that many JVMs and VMs efficiently? You can use clustering in the application-server, oh, except that this isn’t supported by the supplier (like I said, the worst application ever!). So we’ve now got monitors and scripts for each JVM and each VM and when something breaks (… and guess what, with this pile of sh*t, it does) we need to go round each node fixing them one-by-one.

Anyway, lessons learned, how should we have scaled such an application? What would I do differently now that I know? (bar using some completely different product of course)

Firstly I would have combined components together where we can. There’s no reason why some WARs couldn’t be homed together (despite the suppliers design suggesting otherwise). This would help reduce some of the JVMs and improve the reliability of some components (that queueing mechanism specifically).

Secondly, given we can’t use a real cluster in the app-server, we can (now) use containers to package up each component of the application instead. This then becomes our scaling and maintenance point and rather than having 50 servers to manage we have 7 or 8 images to maintain (still a lot for such an application). This then allows us to scale up or down at the container level more quickly. The whole application wouldn’t fit this model (DB in particular would remain as it is) but most of it would should.

Of course it doesn’t solve the root cause unfortunately but it is a more elegant, maintainable and cheaper solution and, bar eradicating this appalling product from the estate, one that would have been so much more satisfying.

So thats the project for the summer.. Work out how to containerise this sort of COTS application, how to connect, route and scale them in a way that is manageable, efficient and cost effective. Next project please!

 

Security, Impact, Truth and Environments

I’ve been known to berate others for abusing environments – despite my personal habits – but I think its time for me to curtail my anger and reconsider exactly what distinguishes one environment from another and why.

We’re used to managing a plethora of environments – production, standby, prod-support, pre-production, performance test, UAT, system-test, development-integration, dev etc. – each of which has its own unique characteristics and purpose and each with a not insignificant cost.

With all those environments we can very easily have 5 or 6 times the infrastructure required to run production sitting mostly idle – and yet still needing to be maintained and patched and consuming kilowatts of power. All this for what can seem like no good reason bar to satisfy some decades old procedural dictate handed down by those upon high.

Unsurprisingly many organisations try to combine responsibilities into a smaller set of environments to save on $’s at the cost of increased risk. And recent trends in dev-ops, cloud and automation are helping to reduce the day-to-day need for all these environments even further. After all, if we can spin up a new server, install the codebase and introduce it into service in a matter of minutes then why not kill it just as quickly? If we can use cheaper t2.micro instances in dev and use m4.large only in prod then why shouldn’t we do so?

So we can shrink the number and size of environments so now we only have 2 or 3 times production and with auto-scaling this baseline capacity can actually be pretty low.

If we can get there…

… and the problem today is that whilst the technology exists, the legacy architectures, standards, procedures and practices adopted over many years by organisations simply don’t allow these tools and techniques to be adopted at anywhere near the pace at which they are developing in the wild. That application written 10 years ago just doesn’t fit with the new cloud strategy the company is trying to develop. In short, revolution is fast (and bloody) and evolution is slow.

Our procedures and standards need to evolve at the same rate as technology and this just isn’t happening.

So I’ve been considering what all these environments are for and why they exist and think it comes down to three concerns; security, impact and truth.

– Security – What’s the security level of the data held? More often than not the production environment is the only one authorised to contain production data. That means it contains sensitive data or PII, has lots of access-control and auditing, firewalls everywhere, tripwires etc. There’s no way every developer is going to get access to this environment. Access is on a needs-to-know basis only… and we don’t need (and shouldn’t want) to know.
– Impact – Whats the impact to the business if the environment dies or runs slow? If dev goes down, no-one cares. Hell, if pre-prod goes down no-one bar prod-support really care.
– Truth – How true to version X does the environment have to be? Production clearly needs to be the correct release of the codebase across the board (MVT aside). If we have the wrong code with the wrong database then it matters. In the development environment?.. if a script fails then frankly it’s not the end of the world, and besides dev is usually going to be version X+n, unstable and flaky in any case.

So in terms of governance it’s those things that keep management awake at night. They want to know who’s got access to what, what they can do, on what boxes, with what assets and what the risk is to data exposure. When we want to push out the next release they want to know the impact if it screws up, that we’ve got a back-out plan for when it does and that we’ve tested it – the release, the install plan and the back-out. In short, they’re going to be a complete pain in the backside. For good reason.

But can we rethink our environments around these concerns and does this help? If we can demonstrate to management that we’ve met these needs then why shouldn’t they let us reduce, remove and recycle environments at will?

Production and stand-by will have to be secure and the truth. But the impact if stand-by goes down isn’t the same. There’s a risk if prod falls over but that’s not the same thing. So allowing data-analysts access to stand-by to run all sorts of wild and crazy queries may not be an issue unless prod falls flat on its face – a risk some will be willing to take to make more use of the tin and avoid environment spread. Better still, if the data in question isn’t sensitive or is just internal-use-only then why not mirror a copy into dev environments to provide a more realistic test data-set for developers?

And if the data is sensitive? Anonymise it and use that; or a decent sample of it, in dev and test environments. Doing so will improve the quality of code by increasing the likelihood developers will detect patterns and edge-cases sooner in the development cycle.

In terms of impact, If the impact to the business of an application outage is low then why insist on the full range of environments when frankly one or two will do? Many internal applications are only used 9 to 5 and have an RTO and RPO of in excess of 24 hrs. The business need to clearly understand what they’re agreeing to but ultimately it’s their $’s we’re spending and once they realise the cost they may be all too willing to take the risk. Having five different environments for every application for the sake of consistency alone isn’t justifiable.

And not all truths are equal. Some components don’t need the same rigour as others and may have lower impact to the business if they’re degraded to some degree. Allowing some components; especially expensive ones, to have fewer environments may complicate topologies and reduce the general comprehensiveness of the system but if we can justify it then so be it. We do though need to make sure this is very clearly understood by all involved else chaos can ensue – especially if some instances span environments (here be dragons).

Finally, if engineering teams paid more attention during development to performance and operability and could demonstrate this then the need for dedicated performance/pre-prod environments may also be reduced. We don’t need an environment matching production to understand the performance profile of the application under load. We just need to consider the systems characteristics and test cases with a willingness (i.e. an acceptance of risk) to extrapolate. A truthful representation of production is usually not necessary.

Risk is everything here and if we think about how the applications concerns stack up against the security risk, the impact risk to the business and risk of things not being the truth, the whole truth and nothing but… then perhaps we can be smarter about how we structure our environments to help reduce the costs involved irrespective of adopting revolutionary technology.

Traceability

We can have a small server…

Screen Shot 2016-02-13 at 11.43.20

…a big server (aka vertical scaling)…

Screen Shot 2016-02-13 at 11.43.27

.. a cluster of servers (aka horizontal scaling)…

Screen Shot 2016-02-13 at 11.48.34

.. or even a compute grid (horizontal scaling on steroids).

Screen Shot 2016-02-13 at 11.43.41

For resiliency we can have active-passive…

Screen Shot 2016-02-13 at 11.52.46

… or active-active…

Screen Shot 2016-02-13 at 11.52.51

… or replication in a cluster or grid…

Screen Shot 2016-02-13 at 11.59.01

…each with their own connectivity, load-balancing and routing concerns.

From a logical perspective we could have a simple client-server setup…

Screen Shot 2016-02-13 at 13.03.29

…a two tier architecture…

Screen Shot 2016-02-13 at 13.03.35

…an n-tier architecture…

Screen Shot 2016-02-13 at 13.03.40

…a service oriented (micro- or ESB) architecture…

Screen Shot 2016-02-13 at 13.03.44

…and so on.

And in each environment we can have different physical topologies depending on the environmental needs with logical nodes mapped to each environments servers…

Screen Shot 2016-02-13 at 13.04.01

With our functional components deployed on our logical infrastructure using a myriad of other deployment topologies..

Screen Shot 2016-02-13 at 13.04.21

… or …

Screen Shot 2016-02-13 at 13.04.37

… and on and on and on…

And this functional perspective can be implemented using dozens of design patterns and a plethora of integration patterns.

Screen Shot 2016-02-13 at 12.08.46

With each component implemented using whichever products and packages we choose to be responsible for supporting one or more requirements and capabilities…

Screen Shot 2016-02-13 at 13.20.31

So the infrastructure we rely on, the products we select, the components we build or buy; the patterns we adopt and use… all exist for nothing but the underlying requirement.

We should therefore be able to trace from requirement through the design all the way to the tin on the floor.

And if we can do that we can answer lots of interesting questions such as “what happens if I turn this box off?”, “what’s impacted if I change this requirement?” or even “which requirements are driving costs?”. Which in turn can help improve supportability, maintainability and availability and reduce costs. You may even find your product sponsor questioning if they really need this or that feature…

Letsencrypt on Openshift

If you really wanted to know you’d have found it but for what it’s worth, this site now runs on Redhats OpenShift platform. For a while I’ve been thinking I should get an SSL cert for the site. Not because of any security concern but because Google and the like rank sites higher up if they are https and; well, this is nonfunctionalarchitect.com and security is a kind of ‘thing’ if you know what I mean. But certs cost £’s (or $’s or €’s or whatever’s). Not real pricey, but still I can think of other things to spend £50 on.

But hello!, along comes letsencrypt.org. A service allowing you to create SSL certs for free! Now in public beta. Whoo hooo!

It isn’t particularly pretty at the moment and certs only last 90 days but it seems to work ok. For Openshifts WordPress gear you can’t really do much customization (and probably don’t want to) so installing letsencrypt on that looks messier than I’d like. Fortunately you can create a cert offline with letsencrypt and upload it to wordpress. Steps in a nutshell:

  1. Install letsencrypt locally. Use a Linux server or VM preferably.
  2. Request a new manual cert.
  3. Upload the specified file to your site.
  4. Complete cert request.
  5. Upload certificate to openshift.

Commands:

  1. Install letsencrypt:
    1. git clone https://github.com/letsencrypt/letsencrypt
    2. cd letsencrypt
  2. Request a new manual cert:
    1. ./letsencrypt-auto --agree-dev-preview -d <your-full-site-name> --server https://acme-v01.api.letsencrypt.org/directory -a manual auth -v --debug
  3. This command will pause to allow you to create a file and upload it to your website. The file needs to be placed in the /.well-known/acme-challenge folder and has a nice random/cryptic base-64 encoded name (and what appears to be a JWT token as contents). This is provided on screen and mine was called something like KfMsKDV_keq4qa5gkjmOsMaeKN4d1C8zB3W8CnwYaUI with the contents something like KfMsKDV_keq4qa5gkjmOsMaeKN4d1C8zB3W8CnwYaUI.6Ga6-vVZqcFb83jWx7pprzJuL09TQxU2bwgclQFe39w (except that’s not the real one…). To upload this to an openshift wordpress gear site:
    1. SSH to the container. The address can be found on the application page on Openshift.Screen Shot 2015-12-03 at 21.45.38
    2. Make a .well-known/acme-challenge folder in the webroot which can be done on the wordpress gear after SSHing via.
      1. cd app-root/data/current
      2. mkdir .well-known
      3. mkdir .well-known/acme-challenge
      4. cd .well-known/acme-challenge
    3. Create the file with the required name/content in this location (e.g. see vi).
      1. vi KfMsKDV_keq4qa5gkjmOsMaeKN4d1C8zB3W8CnwYaUI
    4. Once uploaded and you’re happy to contine, press ENTER back on the letsencrypt command as requested. Assuming this completes and manages to download the file you just created you’ll get a response that all is well and the certificates and key will have been created.Screen Shot 2015-12-03 at 21.48.09
    5. To upload these certs to your site (from /etc/letsencrypt/live/<your-site-name/ locally), go to the Openshift console > Applications > <your-app> > Aliases and click edit. This will allow you to upload the cert, chain and private key files as below. Note that no passphrase is required.Screen Shot 2015-12-05 at 13.36.59You need to use fullchain.pem as the SSL cert on Openshift.com and leave the cert chain blank. If you don’t do this then some browsers will work but other such as Firefox will complain bitterly…
    6. Save this and after a few mins you’ll be done.

Once done, you can access the site via a secure HTTPS connection you should see a nice secure icon showing that the site is now protected with a valid cert 🙂

Screen Shot 2015-12-03 at 22.00.33

Details of letsencrypt.org supported browsers are on their website..

Good luck!

 

Woke up this morning…

… annoyed!

Been hearing some daft comments recently like “cost is a secondary concern” and “we shouldn’t worry about cost”... wtf!?

You need a vision. You need  a goal. You need to know what the hell it is you’re trying to achieve!

But! The most important thing once your have the vision is how the hell you’re going to get there. Time and money thus become primary concerns and if you ain’t got the money or you ain’t got the time then you ain’t going there and you need to adjust your expectations!

Hell, I want to go to the moon for a vacation but that isn’t going to happen anytime soon.

And as Hemingway said, “It’s good to have an end to journey toward; but it’s the journey that matters, in the end”.

Waterfall v Agile v Reckless

I was recently asked when I would use an agile instead of waterfall methodology to which I don’t think I gave a very good answer at the time. These things tend to dwell in the mind; in this case at 3am!, and though thoughts at such a hour aren’t necessarily too be trusted, here goes.

“Quality, time, cost – pick any two”, is an often quoted project management notion which I’ll not go in to. A similar triplet can be used to identify which methodology to choose – functionality, time and cost.

The idea goes like this – you can prioritise any two of functionality, time or cost and doing so will tell you which method to use:

Prioritise functionality & cost over time – use waterfall. You know what you want so you can plan it all out up-front, minimise technical-debt, reduce redundancy and keep costs down. It may though be that it takes longer than you really want before you see anything go live. Good for fixed-price work, anything where the scope is well defined and time is less of a concern and some package implementations (package customisation to be avoided (as a general rule!)).

Prioritise cost & time over functionality – use agile. You want something out there soon, at minimal initial cost (think MVP) and can limit the functionality that’s delivered within these time/cost constraints. You can define short delivery cycles, size these accordingly and constrain the functionality that gets delivered within them. Good for T&M projects and anything where the vision/scope or requirement priorities are less clear or subject to change – i.e. most projects.

Prioritise functionality & time over cost – use the reckless approach. This essentially says you don’t give a damn about the money but you want it all and you want it now! If you ever hear a customer claim “money is no object!” run away. Ok, there are a handful of cases; usually government and compliance projects, where this happens but these projects are rare and typically indicative of failings elsewhere in an organisation. In all other cases the customer has frankly lost the plot. Essentially the reckless approach is one where you start trying to deliver all the functionality ASAP without much concern for anything else. You will likely throw out a lot of testing, ignore non-functional requirements, use whatever tools are to hand; whether they’re right for the job or not, and ignore any corporate processes/procedures and standards on the way. Hell, it can work! But most often it leads to unforeseen costs late in projects, excessive rework and; if you do go live with it, a lot of tech-debt and long-term costs which needs to be paid back.

It’s just unfortunate that the reckless approach is used so often when it should be so rare. It’s also unfortunate that those that deliver reckless projects often get promoted (they delivered on-time) without the long term cost being fully understood – a corollary with the quarterly focus issues many large corporations suffer.

So agile should be most common, then waterfall, then reckless. I’d also suggest waterfall should be used sparingly as there’s a tendency for the project to disappear into dark-room for many months before resurfacing and being “not what I wanted” due to poor requirements definition and lack of a good feedback loop. Agile keeps the customer much more engaged on a day-to-day basis which provides that feedback loop and minimises such waste.

It should also be noted that neither agile or waterfall say anything about the work-products needed to deliver them. They don’t say “don’t do requirements definition”, “don’t do design” or “don’t test”, far from it, they just do it in different ways (user-stories v consolidated requirements definition for example). You also need to have an understanding of the long term vision, scope and dependencies within an agile project to avoid excessive tech-debt; or at least be conscious of the implications of short term decisions. From an architectural perspective you still need to identify the building blocks, do the design (logical and physical), establish non-functionals, identify key decisions and dependencies etc. But whilst you may go into all the detail up-front in waterfall you can spread this out in agile with a relatively shallow but broad scope solution outline up-front (essentially identifying the key building blocks and non-functionals) and go into depth as and when needed. To believe you can avoid such work is to make assumptions and to; consciously or otherwise, adopt the reckless approach…

ForgeRock License Confusion

IANAL which doesn’t help me understand the ForgeRock license situation which looks to be as clear as mud.

Blogs like this suggest I’m not the only one to be confused.
Wikipedias page on ForgeRock suggests it’s only the binaries you need a license for (if you compile yourself then is this ok?).
And the OpenAM project site suggests two licenses apply. Though one of these (the first one) states “This License represents the complete agreement concerning subject matter hereof” so how can they then go on to apply another “non-commerical” Creative Commons license constraint?
Joy! 🙁

Don’t Build Your Own Security Solution!

There are only three reasons why building your own security solution is a good idea:

  1. Security through obscurity – There’s less likelihood that anyone will find the holes because no-one else is using it. This is just as well since your home-grown solution is probably riddled with them (see below).
  2. Risk v Cost – You’ve weighed the risk, baulked at the cost and decided the benefits aren’t affordable.
  3. You’re unique! – Some organisations may be unique and/or have unique problems for which there is not an off-the-shelf solution. You may be NASA or a top secret government department, or you could be one of the big internet organisations operating at the edge of technology and scale (fb, Google or the like) or it may just be your business to develop security products.

It’s a fair bet that #3 doesn’t apply to you and if you think the costs are too high then I suspect you’ve not really understood the problem.

There are many reasons why you shouldn’t build your own security solution and should buy off-the-shelf instead:

  1. Risk – The suppliers of security product are much more likely to understand the potential security issues and deal with them correctly than your in-house developers. You may have some really nice people working for you but despite their assurances, they’re not the experts. The most basic of attack vectors; weak passwords, code-injection, CSRF, XSS etc., can be left exposed by an in-house team who don’t have the experience or knowledge to address them correctly. This point really can’t be stressed enough. I do not consider myself to be a security expert but I have seen problems in every single home-grown solution I’ve come across.
  2. Features – Even basic administration capabilities require effort to implement – adding users, granting permissions, password resets, password rotation, revoking access etc. Add to this the expanding list of security patterns and protocols which may need to be supported (basic-auth, forms-based-auth, single-sign-on, WS-Sec, SAML, OAuth, multi-factor authentication etc.) and the range of vulnerabilities you need to address (buffer-overflow, virus scanning, intrusion detection etc.) and a home-grown effort can start to look pretty pathetic. Standards also evolve to improve quality and make interoperability easier and many off-the-shelf products will implement these accordingly. A home-grown effort likely won’t efficiently lend itself to being manipulated to support such standards as they evolve – in either a qualitative or interoperable way.
  3. Swiss cheese – Buying an off the shelf solution isn’t going to solve your problems overnight and your developers will still need to employ secure development practices in case something gets through. However with many layers of security (especially where from a variety of suppliers) comes additional protection. Every product will have holes but you hope the holes don’t all line up allowing something through. The more layers the better though home-grown layers have lots of holes.
  4. Support – As new vulnerabilities are discovered you’ll have (should have) a support contract in place with the supplier to provide patches to address these as they arise. With an in-house solution you’ll probably never know you’ve a problem until it’s too late. Furthermore, the obscurity benefit is a millstone around your neck. Finding developers to support the solution will become increasingly difficult (and expensive) as the years go by. No developer wants to be left supporting pre-historic software, it’s a dead-end career role.
  5. Maintenance – As technology marches on, the environment in which you operate changes. Constant upgrades in browsers, operating systems and devices means you’ve got to continue to develop solutions just to remain compatible with your users. The time and costs this consumes can quickly mount. The same issue can be charged at commercial vendors and you should ensure your supplier has the mind to consider the upgrade path before you sign on the dotted line.
  6. Blame! – Ultimately if something goes wrong you’ll have someone to shout at, sue and/or get emergency fixes from. You can still do the same for in-house solutions, it’s just that you’ll be the one being screamed at and if the breach is bad enough you could well be scanning jobserve sooner than you wanted.
  7. Cost – If you can afford to fund development to address the risks, develop, maintain and support a fully featured product then by all means do so. Odds are that you can’t.

I could (and do) levy many of these issues at any home-grown solution and not just security but I fear security is the most heinous case as it’s taking risks with other peoples data.

Ultimately we need to ask the fundamental question as to whether it’s our business to be writing some software product or whether we should focus on our own specific goals and let someone else, someone more qualified, do that for us.

In the case of security it’s almost always better to buy off the shelf.

Amazon v Amazon

I was reading up a while back about how Amazon (and others) do differential pricing depending on who you are (or who they think you are) including factors such as the type of computer you use. As a very quick test I decided to do two searches…

One (let’s call it “open”) from my own laptop via my default browser used for everyday activity, logged in to Amazon, full of tracking cookies and history etc….

The other (“closed”) from a TOR’ified Linux VM running a clean browser, privoxy and appearing (for now) to be surfacing out of Liberia.

Both searches were for “macbook air” (not that I intend on buying one but they’re common and relatively expensive).

Results were pretty much the same, but not quite and the example below shows a difference…

From “open”:Open

 

From “closed”:Closed

£686 v £669…

So, if proof were needed…. you can’t get Amazon Prime in Liberia! (at least not from amazon.co.uk)… Because that’s really the only difference. You can still buy the device (new) for £669 on the “open” version, it’s just not the “obvious” option.

However, the first item in the “closed” search did suggest purchase via Prime as it did on the “open” version…

Liberia Prime

So there is a difference, but how significant is debatable. It may be a classic diversionary sales tactic – the top item via Prime is more expensive than another via Prime lower in the list but this one (which they want you to buy) is still more expensive than you could have got it for… If they can make you buy that one because you think it’s a bargain over the top item then it’s a good sale!

Who knows… I’m likely reading too much into this after too many beers… In any case it’s largely up to us to be conscious that such behaviour goes on and that we; as human beings, are susceptible to all sorts of strange and suspect sales tactics.

Go search elsewhere, multiple times, from other devices, in Liberia… then save the money instead.

Cash-haemorrhaging public cloud

Interesting point of view on how cloud service providers are haemorrhaging cash to sustain these models in the hope they’ll win big in the long run.

As data storage and compute costs fall they may well be able to sustain existing pricing though I suspect ultimately they’ll need to ratchet things up. Cost comparisons are also hard to get right due to the complexity of pricing from suppliers and I also believe the difference in architectural patterns used in the cloud versus on premise further complicates things (something for another day).

What I do know is that their are those in the industry who cannot afford to be left behind in the race to the cloud; IBM, Microsoft and Google notably. They will likely be pumping all they can into the cloud to establish their position in the market – and maintain their position generally…