2014/04/27

Entropy

Entropy is a measure of disorder in a system. A number of years ago I was flicking through an old book on software engineering from the 1970's. Beyond being a right riveting read it expounded the view that software does not suffer from decay. That once set, software programs, would follow the same rules over and over and produce the same results time and again ad infinitum. In effect that software was free from decay.

I would like to challenge this view.

We design a system, spend many weeks and months considering every edge case, crafting the code so we've handled every possible issue nature can throw at us including those "this exception can never happen but just it case it does.." scenarios. We test it till we can test no more without actually going live and then release our latest most wondrous creation on the unsuspecting public. It works and for a fleeting moment all is well with the universe... from this moment on decay eats away at our precious creation like rats gnawing away on the discarded carcass of the sunday roast.

Change is pervasive and whilst it's seems reasonable enough that were we able to precisely reproduce the starting conditions the program would run time and again as it did the first time, this isn't correct for reasons of quantum mechanics and our inability to time travel (at least so far as we know today). However, I'll ignore the effects of quantum mechanics and time-travel for now and focus on the more practical reasons for change and how this causes decay and increasing entropy in computer systems.

Firstly there's the general use of the system. Most systems have some sort of data-store; if only for logging, and data is collected in increasing quantities and in a greater variety of combinations over time. This can lead to permutations which were never anticipated which leads to exposure of functional defects or increase volumes beyond the planned capacity of the system. The code may remain the same but when we look at a system and consider it as an atomic unit in its entirety, it is continuously changing. Subsequent behaviour becomes increasingly unpredictable.

Secondly there's the environment the system exists within - most of which is totally beyond any control. Patches for a whole stack of components are continually released from the hardware up. The first response from most first-line support organisations is "patch to latest level" (which is much easier said than done) but if you do manage to keep up with the game then these patches will affect how the system runs.

Conversely, if you don't patch then you leave yourself vulnerable to the defects that the patches were designed to resolve. The knowledge that the defect itself exists changes the environment in which the system runs because now the probability that someone will try to leverage the defect is significantly increased - which again increases the uncertainty over how the system will operate. You cannot win and the cost of doing nothing may be more than the cost of addressing the issue.

Then there's change that we inflict ourselves.

If you're lucky and the system has been a success then new functional requirements will arise - this is a good thing, perhaps one for later but a system which does not functionally evolve is a dead-end and essentially a failure - call it a "panda" if you wish. The business will invent new and better ways to get the best out of the system, new use cases which can be fulfilled become apparent and a flourish of activities follow. All of which change the original system.

There's also non-functional requirements change. Form needs a refresh every 18 months or so, security defects need to be fixed (really, they do!), performance and capacity improvements may be needed and the whole physical infrastructure needs to be refreshed periodically. The simple act of converting a physical server to virtual (aka P2V conversion) which strives to keep the existing system as close to current as possible; detritus and all, will typically provide more compute, RAM and disk than was ever considered possible. Normally this makes the old application run so much faster than before but occasionally that speed increase can have devastating effects on the function of the system within time sensitive applications. Legislative requirements, keeping compliant with latest browsers etc., all bring more change...

Don't get me wrong, change is a good thing normally and the last thing we want is a world devoid of change. The problem is that all this change increases the disorder (entropy) of the system. Take the simple case of a clean OS install. Day 1, the system is clean and well ordered. Examining the disk and logs shows a tidy registry and clean log and temporary directories. Day 2 brings a few patches, which adds registry entries, some logs, a few downloads etc. but it's still good enough. But by Day n you've a few hundred patches installed, several thousand log files and a raft of old downloads and temporary files lying around.

The constant state of flux means that IT systems are essentially subject to the same tendency for disorder to increase as stated in the second law of thermodynamics. Change unfortunately brings disorder and complexity. Disorder and complexity makes things harder to maintain and manage, increasing fragility and instability. Increased management effort results in increased costs.

Well, that's enough for today.. next week, what we can do about it...

1 comment:

  1. […] on software engineering which states that “software doesn’t decay!”. However, as pointed out previously, software is subject to change from a variety of sources, change brings decay and decay increases failure rates. Bathtub curve […]

    ReplyDelete

Voyaging dwarves riding phantom eagles

It's been said before... the only two difficult things in computing are naming things and cache invalidation... or naming things and som...