Performance Testing is Easy

Performance testing is easy. We just throw as many requests at the system as we can as quickly as we want and measure the result. Job done right?

tl;dr? Short form…

  1. Understand the user scenarios and define tests. Review the mix of scenarios per test and the type of tests to be executed (peak, stress, soak, flood).
  2. Size and prepare the test environment and data. Consider the location of injectors and servers and mock peripheral services and systems where necessary.
  3. Test the tests!
  4. Execute and monitor everything. Start small and ramp up.
  5. Analyse results, tune, rinse and repeat until happy.
  6. Report the results.
  7. And question to what level of depth performance testing is really required…

Assuming we’ve got the tools and the environments, the  execution of performance tests should be fairly simple. The first hurdle though is in preparing for testing.

User Scenarios and Test Definitions

In order to test we first need to understand the sort of user scenarios that we’re going to encounter in production which warrant testing. For existing systems we can usually do some analysis on web-logs and the like to figure out what users are actually doing and try to model these scenarios. For this we may need a year or more of data to see if there are any seasonal variations and to understand what the growth trend looks like. For new systems we don’t have this data so need to make some assumptions and estimates as to what’s really going to happen. We also need to determine which of the scenarios we’re going to model and the transaction rates we want them to achieve.

When we’ve got system users calling APIs or running batch-jobs the variability is likely to be low. Human users are a different beast though and can wander off all over the place doing weird things. To model all scenarios can be a lot of effort (which equals a lot of cost) and a risk based approach is usually required. Considerations here include:

  • Picking the top few scenarios that account for the majority of activity. It depends on the system, but I’d suggest keeping these scenarios down to <5 – the fewer the better so long as it’s reasonably realistic.
  • Picking the “heavy” scenarios which we suspect are most intensive for the system (often batch jobs and the like).
  • Introducing noise to tests to force the system into doing things they’d not be doing normally. This sort of thing can be disruptive (e.g. a forced load of a library not otherwise used may be just enough to push the whole system over the edge in a catastrophic manner).

We next need to consider the relative mix of user scenarios for our tests (60% of users executing scenario A, 30% doing scenario B, 10% doing scenario C etc.) and the combinations of scenarios we want to consider (running scenarios A, B, C ; v’s A, B, C plus batch job Y).

Some of these tests may not be executed for performance reasons but for operability – e.g. what happens if my backup runs when I’m at peak load? or what happens when a node in a cluster fails?

We also need test data.

For each scenario we should be able to define the test data requirements. This is stuff like user-logins, account numbers, search terms etc.

Just getting 500 test user logins setup can be a nightmare. The associated test authentication system may not have capacity to handle the number of logins or account and we may need to mock it out. It’s all too common for peripheral systems not to be in the position to enable performance testing as we’d like and in any case we may want something that is more reliable when testing. For any mock services we do decide to build we need to work out how this should respond and what the performance of this should look like (it’s no good having a mock service return in 0.001 seconds when the real thing takes 1.8 seconds).

Account numbers have security implications and we may need to create dummy data. Search terms; especially from humans, can be wild and wonderful – returning millions or zero records in place of the expected handful.

In all cases, we need to prepare the environment based on the test data we’re going to use and size it correctly. Size it? Well, if production is going to have 10 millions records it’s not much good testing with 100! Copies of production data; possibly obfuscated, can be useful for existing systems. For new though we need to create the data. Here be dragons. The distribution of randomly generated data almost certainly won’t match that of real data – there are far more instances of surnames like Smith, Jones, Taylor, Williams or Brown than there are like Zebedee. If the distribution isn’t correct then the test may be invalid (e.g. we may hit one shard or tablespace and associated nodes and disks too little or too much).

I should point out that here that there’s a short cut for some situations. For existing systems with little in the way of stringent security requirements, no real functional changes and idempotent requests; think application upgrades or hardware migrations of primarily read-only websites, replaying the legacy web-logs may be a valid way to test. It’s cheap, quick and simple – if it’s viable.

We should also consider the profile and type of tests we want to run. For each test profile there are three parts. The ramp-up time (how long it takes to get to the target volume), steady-state time (how long the test runs at this level for), ramp-down time (how quickly we close the test (we usually care little for this and can close the test down quickly but in some cases we want a nice clean shutdown)). In terms of test types there are:

  • Peak load test – Typically a 1 to 2 hr test at peak target volumes. e.g. Ramp-up 30 minutes, steady-state 2hrs, ramp-down 5 mins.
  • Stress test – A longer test continually adding load beyond peak volumes to see how the system performs under excessive load and potentially where the break point is. e.g. Ramp-up 8 hrs, steady-state 0hrs, ramp-down 5 mins.
  • Soak test – A really long test running for 24hrs or more to identify memory leaks and the impact of peripheral/scheduled tasks. e.g. Ramp-up 30 mins, steady-state 24hrs, ramp-down 5 mins.
  • Flood test (aka Thundering Herd) – A short test where all users arrive in a very short period. In this scenario we can often see chaos ensue initially but the environment settling down after a short period. e.g. Ramp-up 0mins, steady-state 2hrs, ramp-down 5 mins

So we’re now ready to script our tests. We have the scenarios, we know the transaction volumes, we have test data, our environment is prep’d and we’ve mocked out any peripheral services and systems.

Scripting

There are many test tools available from the free Apache JMeter and Microsoft web stress tools to commercial products such as HP LoadRunner and Rational Performance Tester to cloud based solutions such as Soasta or Blitz. Which tool we choose depends on the nature of the application and our budget. Cloud tools are great if we’re hosting in the public cloud, not so good if we’re an internal service.

The location of the load injectors (the servers which run the actual tests) is also important. If these are sitting next to the test server we’ll get different results than if the injector is running on someones laptop connected via a VPN tunnel over a 256kbit ADSL line somewhere in the Scottish Highlands. Which case is more appropriate will depend on what we’re trying to test and where we consider the edge of our responsibility to lie. We have no control over the sort of devices and connectivity internet users have so perhaps our responsibility stops at the point of ingress into our network? Or perhaps it’s a corporate network and we’re only concerned with the point of ingress into our servers? We do need to design and work within these constraints so measuring and managing page weight and latency is always a concern but we don’t want to have the complexity of all that “stuff” out there which isn’t our responsibility weighing us down.

Whichever tool we choose, we can now complete the scripting and get on with testing.

Testing

Firstly, check everything is working. Run the scripts with a single user for 20 minutes or so to ensure things are responding as expected and that the transaction load is correct. This will ensure that as we add more users we’re scaling as we want and that the scripts aren’t themselves defective. We then quite quickly ramp the tests up, 1 user, 10, users, 100 users etc. This helps to identify any concurrency problems early on with fewer users than expected (which can add too much noise and make it hard to see whats really going on).

If we’ve an existing system, once we know the scripts work we will want to get a baseline from the legacy system to compare to. This means running the tests on the legacy system. What? Hang on! This means we need another instance of the system available running the old codebase with similar test data and similar; but possibly not identical, scripts! Yup. That it does.

If we’ve got time-taken logging enabled (%D for Apache mod_log_config) then we could get away with comparing the old production response times with the new system so long as we’re happy the environments are comparable (same OS, same types of nodes, same spec, same topology, NOT necessarily the same scale in terms of numbers of servers) and that the results are showing the same thing (which depends on what upstream network connectivity is being used). But really, a direct comparison of test results is better – comparing apples with apples.

We also need to consider what to measure and monitor. We are probably interested in:

  • For the test responses:
    • Average, max, min and 95th percentile for the response time per request type.
    • Average, max, min size for page weight.
    • Response codes – 20x/30x probably good, lots of 40x/50x suggests the test or servers are broken.
    • Network load and latency.
  • For the test servers:
    • CPU, memory, disk and network utilisation throughout the test run.
    • Key metrics from middle-ware; queue depths, cache-hit rates, JVM garbage collection (note that JVM memory will look flat at the server level so needs some JVM monitoring tools). These will vary depending on the middle-ware and for databases we’ll want a DBA to advise on what to monitor.
    • Number of sessions.
    • Web-logs and other log files.
  • For the load injectors:
    • CPU, memory, disk and network utilisation throughout the test run. Just to make sure it’s not the injectors that are overstretched.

And finally we can test.

Analysis and Tuning

It’s important to verify that the test achieved the expected transaction rates and usage profiles. Reviews of log files to ensure no-errors and web-logs to confirm transaction rates and request types help verify that all was correct before we start to review response times and server utilisation.

We can then go through the process of correlating test activity with utilisation, identifying problems, limits near capacity (JVM memory for example) and extrapolate for production – for which some detailed understanding of the scaling nature of the system is required.

It’s worth noting that whilst tests rarely succeed first time, in my experience it’s just as likely to be an issue with the test as it is with the system itself. It’s therefore necessary to plan to execute tests multiple times. A couple of days is normally not sufficient for proper performance testing.

All performance test results should be documented for reporting and future needs. To already have an understanding of why certain changes have been made and a baseline to compare to the next time the tests are run is invaluable. It’s not war-and-peace, just a few of pages of findings in a document or wiki. Most test tools will also export the results to a PDF which can be attached to keep track of the detail.

Conclusion?

This post is already too long but  one thing to question is… Is it worth the effort?

A Zipf distribution exists for systems and few really have that significant a load. Most handle a few transactions a second if that. I wouldn’t suggest “no performance testing” but I would suggest sizing the effort depending on the criticality and expected load. Getting a few guys in the office to hit F5 whilst we eyeball the CPU usage may well be enough. In code we can also include timing metrics in unit tests and execute these a few thousand times in a loop to see if there’s any cause for concern. Getting the engineering team to consider and monitor performance early on can help avoid issues later and reduce he need for multiple performance test iterations.

Critical systems with complex transactions or an expected high load (which as a rough guide I would say is anything around 10tps or more) should be tested more thoroughly. Combining capacity needs with operational needs informs the decision – four 9’s and 2k tps is the high end from my experience – and a risk based approach should always be used when considering performance testing.

Go-Daddy: Low TTL DNS Resolution Failures

Some of you may have noticed recently that  www.nonfunctionalarchitect.com was not resolving correctly much of the time. At first I thought this was down to DNS replication taking a while though that shouldn’t really explain inconsistent results from the same DNS servers (once picked up they should stick assuming I don’t change the target (which I hadn’t)).

So eventually I called Go-Daddy support who weren’t much help and kept stating that “it works for us” suggesting it was my problem. This despite confirmation from friends and colleagues that they see the same issue from a number of different ISPs. They also didn’t want to take the logs I’d captured demonstrating the problem or give me a reference number – a far cry from the recorded message in the queue promising to “exceed my expectations”! But hey, they’re cheap…

Anyway… I’d set the TTL (Time To Live) on my DNS records to 600 seconds. This is something I’ve done since working on migration projects where you’d want DNS TTL to be short to minimise the time clients point at the old server (note: you need to make the change at least x1 the legacy TTL value before you start the migration… and not every DNS server obeys your TTL… but it’s still worth doing).  This isn’t an insane value normally but really depends on whether your nameservers can handle the increased load. I asked the support guy if this was ok and he stated that it was and all looked fine with my DNS records… Cool, except that my problem still existed!

I had to try something so setup a simple shell script on a couple of servers to perform a lookup (nslookup) on Google.com, www.nonfunctionalarchitect.com and pop.nonfunctionalarchitect.com, and set the TTL to 1 day on www and 600secs on pop. This should hopefully prove that (a) DNS resolution is working (Google.com resolves), (b) that I am suffering a problem on www and pop and; with a bit of luck, (c) demonstrate if increasing the TTL makes any difference.

The result shows no DNS resolution fails for either Google.com or www.nonfunctionalarchitect.com. On the other hand pop fails around 10% of the time (12 failures from 129 requests). Here’s a few of the results:

pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
** server can't find pop.nonfunctionalarchitect.com: NXDOMAIN
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
** server can't find pop.nonfunctionalarchitect.com: NXDOMAIN
** server can't find pop.nonfunctionalarchitect.com: NXDOMAIN
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.
pop.nonfunctionalarchitect.com	canonical name = pop.secureserver.net.

I can think of a number of reasons why this may be happening; including load on the Go-Daddy nameservers, overly aggressive DoS counter-measures, or misaligned configuration in nameserver/CNAME configuration etc. Configuration seems ok but I do wonder about the nameservers 1hr TTL versus the CNAMEs 600s TTL. For now it seems more stable at least and I’ll do some experimentation with TTL values later to see if I can pin this down.

In the meantime, if you’re getting DNS resolution failures with Go-Daddy and have relatively low TTL values set (<1hr) then consider increasing these to see if that helps.

Windows 7 Incident

Having recently been responsible for an estate wide software upgrade programme for many thousand devices to Windows 7 I sympathise but have to find this amusing. However, it is an interesting approach to achieving a refresh in particularly short order… Make the best of it guys, treat it as an opportunity to audit your estate… I do hope your backup procedures are working though…  😉

Windows 7 Incident

wind7incident

Entropy – Part 2

A week or so ago I wrote a piece on entropy and how IT systems have a tendency for disorder to increase in a similar manner to the second law of thermodynamics. This article aims to identify what we can do about it…

It would be nice if there was some silver bullet but the fact of the matter is that; like the second law, the only real way to minimise disorder is to put some work in.

1. Housekeeping

As the debris of life slowly turns your pristine home into something more akin to the local dump, so the daily churn of changes gradually slows and destabilises your previously spotless new IT system. The solution is to crack on with the weekly chore of housekeeping in both cases (or possibly daily if you’ve kids, cats, dogs etc.). It’s often overlooked and forgotten but a lack of housekeeping is frequently the cause of unnecessary outages.

Keeping logs clean and cycling on a regular basis (e.g. hoovering), monitoring disk usage (e.g. checking you’ve enough milk), cleaning up temporary files (e.g. discarding those out of date tins of sardines), refactoring code (e.g. a spring clean) etc. is not difficult and there’s little excuse for not doing it. Reviewing the content of logs and gathering metrics on usage and performance can also help anticipate how frequently housekeeping is required ensure smooth running of the system (e.g. you could measure the amount of fluff hoovered up each week and use this as the basis to decide which days and how frequently the hoovering needs doing – good luck with that one!). This can also lead to additional work to introduce archiving capabilities (e.g. self storage) or purging of redundant data (e.g. taking the rubbish down the dump). But like your home, a little housekeeping done frequently is less effort (cost) than waiting till you can’t get into the house because the doors jammed and the men in white suits and masks are threatening to come in and burn everything.

2. Standards Compliance

By following industry standards you stand a significantly better chance of being able to patch/upgrade/enhance without pain in the future than if you decide to do your own thing.

That should be enough said on the matter but the number of times I see teams misusing APIs or writing their own solutions to what are common problems is frankly staggering. We (and me especially) all like to build our own palaces. Unfortunately we lack sufficient exposure to the space of a problem to be able to produce designs which combines elegance with flexibility to address the full range of use cases or the authority and foresight to predict the future and influence this in a meaningful way. In short, standards are generally thought out by better people than you or me.

Once a standard is established then any future work will usually try to build on this or provide a roadmap of how to move from the old standard to the new.

3. Automation

The ability to repeatedly and reliably build the system decreases effort (cost) and improves quality and reliability. Any manual step in the build process will eventually lead to some degree of variance with potentially unquantifiable consequences. There are numerous tools available to help with this (e.g. Jenkins) though unfortunately usage of such tools is not as widespread as you would hope.

But perhaps the real killer feature is test automation which enables you to continuously execute tests against the system at comparatively negligible cost (when compared to maintaining a 24×7 human test team). With this in place (and getting the right test coverage is always an issue) you can exercise the system in any number of hypothetical scenarios to identify issues; both functional and non-functional, in a test environment before the production environment becomes compromised.

Computers are very good at doing repetitive tasks consistently. Humans are very good at coming up with new and creative test cases. Use each appropriately.

Much like housekeeping, frequent testing yields benefits at lower cost than simply waiting till the next major release when all sorts of issues will be uncovered and need to be addressed – many of which may have been around a while though no-one noticed… because no-one tested. Regular penetration testing and review of security procedures will help to proactively avoid vulnerabilities as they are uncovered in the wild, and regular testing of new browsers will help identify compatibility issues before your end-users do. There are some tools to help automate in this space (e.g. Security AppScan and WebDriver) though clearly it does cost to run and maintain such a continuous integration and testing regime. However, so long as the focus is correct and pragmatic then the cost benefits should be realised.

4. Design Patterns

Much like standards compliance, use of design patterns and good practices such as abstraction, isolation and dependency injection can help to ensure changes in the future can be accommodated at minimal effort. I mention this separately though since the two should not be confused. Standards may (or may not) adopt good design patterns and equally non-standard solutions may (or may not) adopt good design patterns – there are no guarantees either way.

Using design patterns also increases the likelihood that the next developer to come along will be able to pick up the code with greater ease than if it’s some weird hair-brained spaghetti bowl of nonsense made up after a rather excessive liquid lunch. Dealing with the daily churn of changes becomes easier, maintenance costs come down and incidents are reduced.

So in summary, entropy should be considered a BAU (Business as Usual) issue and practices should be put in place to deal with it. Housekeeping, standards-compliance, automation through continuous integration and use of design patterns all help to keep the impact of change minimised and keep the level of disorder down.

Next time, some thoughts on how to measure entropy in the enterprise…

Feedback – Logging and Monitoring

It seems to me that we are seeing an increasing number of issues such as this reported by the Guardian. A lost transaction results in a credit default against an individual with the result that they cannot obtain a mortgage to buy a house. Small error for the company, huge impact for the individual.

The company admitted that despite the request being submitted on their website they did not receive the request!? So either the user pressed submit then walked away without noting the response was something other than “all ok!” or the response was “all ok!” and the company failed to process the request correctly.

If the former then, well, user error for being a muppet… As end users we all need to accept some responsibility and check that we get the feedback we expect.

For the latter, there are several reasons why subsequent processing could have failed. Poor transaction management so the request never gets committed, poor process management so the request drops into some dead queue never to be dealt with (either through incompetence or through malicious intent), or through system failure and a need to rollback with resulting data loss.

With the growth in IT over the past couple of decades there are bound to be some quality issues as the result of ever shorter, demanding and more stringent deadlines and budgets. Time and effort needs to be spent exploring the hypothetical space of what could go wrong so that at least some conscious awareness and acceptance of the risks is achieved. This being the case I’m usually quite happy to be overruled by the customer on the basis of cost and time pressures.

However, it’s often not expensive to put in place some logging and monitoring – and in this case there must have been something for the company to admit the request had been submitted. Web logs, application logs, database logs etc. are all valuable sources of information when PD’ing (Problem Determination). You do though need to spend at least some time reviewing and auditing these so you can identify issues and deal with them accordingly.

I remember one case where a code change was rushed out which never actually committed any transaction. Fortunately we had the safety net of some judicious logging which allowed us to recover and replay the transactions. WARNING: It worked here but this isn’t always a good idea!

In general though, logging and monitoring are a very good idea. In some cases system defects will be identified, in others, transient issues will be found which may require further work to deal with them temporarily. Whatever the underlying issue it’s important to incorporate feedback and quality controls into the design of systems to identify problems before they become disasters. At a rudimentary level logging can help with this but you need to close the feedback loop through active monitoring with processes in place to deal with incidents when they arise. It really shouldn’t just be something that we only do when the customer complains.

I don’t know the detail of what happened in this case. It could have been user-error, or we could applaud the company for having logging in place, or they could just have got lucky. In any case, we need to get feedback on how systems and processes are performing and operating in order to deal with issues when they arise, improve quality and indeed the business value of the system itself through continuous improvement.

Reuse

Reuse! My favourite subject. Now, are you sitting comfortably? Then I’ll begin…

Once upon a time in a land far far away, the king of computer-land was worried. Very worried indeed. His silly prime minister had borrowed lots and lots of money to build lots of new computer systems and programs and now they couldn’t pay the interest on the debt. Worse still, none of the systems worked together and the land was becoming a confusing mess and there were lots of traffic jams and angry people. No-one knew how it worked and everything kept breaking. It was very expensive and it didn’t work. The villagers were not happy and were threatening to chop the heads off the king and queen because they were French.

Then one day, a strange young prince claiming to be from a neighbouring country arrived bringing promises to sort out all the mess and clean the country up. And what’s more, he’d do it cheaply and the country would have more money and better computer systems. The king and queen were very happy and the villagers were pleased as well – although they still want to chop the heads of the king and queen, because they were French and it would be fun.

So they listened to the prince and liked his ideas and gave him all the money they had left. The prince was going to build even more computer systems but they would all be based on the same design so would be cheap and quick to build. This meant he could spend more money on the design so it would be very good as well as cheap to build.

Then the prince said that he could also make a large hotel and everyone could live under the same roof. This would save on roofs because there would only be one and would be cheaper to run because there would only be one electricity bill. The villagers liked this because they liked going on holiday. The king and queen liked this because they had decided to go on holiday and so the villagers could not chop off their heads even though they were French.

Then the prince started to design the computer systems. He decided to start with the post-box because everyone sent letters. So he spoke to Granny Smith and Mrs Chatterbox about what they needed. They liked his design. It was round and red and pretty – it looked a bit like the old post-boxes.

Then he spoke to the bookshop keeper who didn’t like his design because it was too small for him to post a book. So the prince made it bigger, much bigger.

Then he spoke to the postman who didn’t like it because it was too big and would give him too many parcels to carry but the prince decided to ignore the postman because he was clearly an idiot.

So two of the postboxes were built; one in case the other was full, and the villagers liked them a lot even though the postman did not.

Next the prince decided to build the hotel so asked the villagers how they would like their room to look; because there could only be one design. Some wanted it round, some square, some with a balcony, some with stairs… and everyone want an en-suite with bidet even if they did not know how to use it. So the prince designed a flexible framework consisting of transformable panels which could be positioned wherever the villager chose. No-one liked the tents and the bidet was missing. The villagers were very angry and started to build a guillotine because they were French.

Then some of the villagers started to place their tents at the entrance to the hotel so they could get out quickly. But this stopped other villagers from coming in so made them angry. Then another villager blocked the toilet and all the villagers were angry and the hotel staff decided to go on strike because they were French and they hadn’t had a strike yet.

So the villagers decided to summon the king and queen to come back from holiday and sort out the mess. So they each sent a letter recorded delivery. But the postbox didn’t understand what “recorded delivery” meant because it was just a big round red box and and postman didn’t want to pick up all the letters anyway because there were too many to carry and they hadn’t paid the postage. So the king and queen didn’t return to sort out the mess and the villagers were apoplectic with rage.

So the villagers burnt all the tents and drowned the postman and the prince in the river. Then the king and queen returned from holiday to find the city on fire and lots of angry villagers carrying pitchforks and pointing to a guillotine. But the king and queen were fat and so couldn’t run away. So the villagers decided to form a republic and elected the prime-minister to become the president. The president chopped off the heads of the king and queen and the villagers were happy so gave a gallic shrug; because they were French, and lived happily for the next week or so…

All of which begs the question… what’s this got to do with reuse?

Well, two things.

  1. Design reuse requires good design to be successful. And for the design to be good there must be lots of consistent requirements driving it. All too often reuse programs are based on the notion of “build it and they will come” where a solution is built for a hypothetical problem which it’s believed many requirements face. Often the requirements don’t align and a lot of money is spent designing a multi-functional beast which tries; and often fails, to do too much which increases complexity which increases cost. The additional effort needed to consider multiple requirements from disparate systems significantly increases design, build and maintenance costs.  To make this worse, true cases of reuse are often common problems in the wider industry and so industry standard solutions and design patterns may exist which have been thought out by smarter people than you or me. To tackle these in-house is tantamount to redesigning the wheel… generally badly.
  2. Instance reuse sounds like a great idea – you can save on licenses, on servers and other resources – but this creates undesirable dependencies which are costly to resolve and act to slow delivery and reduce ease of maintenance. Furthermore, resource savings are often limited as you’ll only save on a narrow portion of the overall requirements – you’ll need more compute, more memory and more storage. Getting many parties to agree to changes is also time-consuming and consequently costly and makes management of the sum more of a headache than it need be.

Personally I believe if you’re going to progress a reusable asset program you need to validate that there really exists multiple candidate usage scenarios (essentially  the cost of designing and building a reusable asset must be less than cost of designing and building n assets individually), that requirements are consistent and that you’re not reinventing the wheel. If this is the case, then go for it. Alternatively you may find an asset harvesting program to review and harvest “good” assets may yield better results; technically as well as being more efficient and cost effective. Then there’s the view that all reuse is opportunistic in so much as using something designed to be “re”used is really just “use”, and not “reuse” – as I once noted, “wearing clean underpants is ‘use’, turning them inside out and back to front is ‘reuse'”.

In terms of instance reuse, in my view it’s often not worth saving a few licenses given the headaches that results from increased dependencies between what should be independent components. The problem is complicated with hardware, rack space and power consumption so is often not clear and some compromise is needed. However, the silver bullet here is virtualisation where a hypervisor can allocate and share resources out dynamically allowing you to squeeze many virtual machines onto one physical machine. License agreements may allow licensing at the physical CPU level instead of virtual CPU which can then be over allocated so you can have many guest instances running on fewer host processors. This isn’t always the case of course and the opposite may be cheaper so this needs careful review of licensing and other solution costs.

Reuse isn’t always a good idea and the complexities needed in design and build and additional dependencies resulting may outweigh the costs of just doing it n times in the first place. Use of standards, design patterns and harvesting of good assets should be the first  front in trying to improve quality and reduce costs . Any justification for creating reusable assets should include a comparative estimates of the costs involved; including ongoing cost to operations.