2014/05/05

Entropy - Part 2

A week or so ago I wrote a piece on entropy and how IT systems have a tendency for disorder to increase in a similar manner to the second law of thermodynamics. This article aims to identify what we can do about it...

It would be nice if there was some silver bullet but the fact of the matter is that; like the second law, the only real way to minimise disorder is to put some work in.

1. Housekeeping

As the debris of life slowly turns your pristine home into something more akin to the local dump, so the daily churn of changes gradually slows and destabilises your previously spotless new IT system. The solution is to crack on with the weekly chore of housekeeping in both cases (or possibly daily if you've kids, cats, dogs etc.). It's often overlooked and forgotten but a lack of housekeeping is frequently the cause of unnecessary outages.

Keeping logs clean and cycling on a regular basis (e.g. hoovering), monitoring disk usage (e.g. checking you've enough milk), cleaning up temporary files (e.g. discarding those out of date tins of sardines), refactoring code (e.g. a spring clean) etc. is not difficult and there's little excuse for not doing it. Reviewing the content of logs and gathering metrics on usage and performance can also help anticipate how frequently housekeeping is required ensure smooth running of the system (e.g. you could measure the amount of fluff hoovered up each week and use this as the basis to decide which days and how frequently the hoovering needs doing - good luck with that one!). This can also lead to additional work to introduce archiving capabilities (e.g. self storage) or purging of redundant data (e.g. taking the rubbish down the dump). But like your home, a little housekeeping done frequently is less effort (cost) than waiting till you can't get into the house because the doors jammed and the men in white suits and masks are threatening to come in and burn everything.

2. Standards Compliance

By following industry standards you stand a significantly better chance of being able to patch/upgrade/enhance without pain in the future than if you decide to do your own thing.

That should be enough said on the matter but the number of times I see teams misusing APIs or writing their own solutions to what are common problems is frankly staggering. We (and me especially) all like to build our own palaces. Unfortunately we lack sufficient exposure to the space of a problem to be able to produce designs which combines elegance with flexibility to address the full range of use cases or the authority and foresight to predict the future and influence this in a meaningful way. In short, standards are generally thought out by better people than you or me.

Once a standard is established then any future work will usually try to build on this or provide a roadmap of how to move from the old standard to the new.

3. Automation

The ability to repeatedly and reliably build the system decreases effort (cost) and improves quality and reliability. Any manual step in the build process will eventually lead to some degree of variance with potentially unquantifiable consequences. There are numerous tools available to help with this (e.g. Jenkins) though unfortunately usage of such tools is not as widespread as you would hope.

But perhaps the real killer feature is test automation which enables you to continuously execute tests against the system at comparatively negligible cost (when compared to maintaining a 24x7 human test team). With this in place (and getting the right test coverage is always an issue) you can exercise the system in any number of hypothetical scenarios to identify issues; both functional and non-functional, in a test environment before the production environment becomes compromised.

Computers are very good at doing repetitive tasks consistently. Humans are very good at coming up with new and creative test cases. Use each appropriately.

Much like housekeeping, frequent testing yields benefits at lower cost than simply waiting till the next major release when all sorts of issues will be uncovered and need to be addressed - many of which may have been around a while though no-one noticed... because no-one tested. Regular penetration testing and review of security procedures will help to proactively avoid vulnerabilities as they are uncovered in the wild, and regular testing of new browsers will help identify compatibility issues before your end-users do. There are some tools to help automate in this space (e.g. Security AppScan and WebDriver) though clearly it does cost to run and maintain such a continuous integration and testing regime. However, so long as the focus is correct and pragmatic then the cost benefits should be realised.

4. Design Patterns

Much like standards compliance, use of design patterns and good practices such as abstraction, isolation and dependency injection can help to ensure changes in the future can be accommodated at minimal effort. I mention this separately though since the two should not be confused. Standards may (or may not) adopt good design patterns and equally non-standard solutions may (or may not) adopt good design patterns - there are no guarantees either way.

Using design patterns also increases the likelihood that the next developer to come along will be able to pick up the code with greater ease than if it's some weird hair-brained spaghetti bowl of nonsense made up after a rather excessive liquid lunch. Dealing with the daily churn of changes becomes easier, maintenance costs come down and incidents are reduced.

So in summary, entropy should be considered a BAU (Business as Usual) issue and practices should be put in place to deal with it. Housekeeping, standards-compliance, automation through continuous integration and use of design patterns all help to keep the impact of change minimised and keep the level of disorder down.

Next time, some thoughts on how to measure entropy in the enterprise...

1 comment:

  1. […] Well, that’s enough for today.. next week, what we can do about it… […]

    ReplyDelete

Voyaging dwarves riding phantom eagles

It's been said before... the only two difficult things in computing are naming things and cache invalidation... or naming things and som...