Skip to main content

Mad Memoization (or how to make computers make mistakes)

Memoization is a technique used to effectively cache the results of computationally expensive functions to improve performance and throughput on subsequent executions. It can be implemented in a variety of languages but is perhaps best suited to functional programming languages where the response to a function should be consistent for a given set of input values. It's a nice idea and has some uses but perhaps isn't all that common since we tend to design  programs so that we only call such functions once; when needed, in any case.

I have a twist on this. Rather than remembering the response to a function with a particular set of values, remember the responses to a function and just make a guess at the response next time.

A guess could be made based on the entropy of the input and/or output values. For example, where the response is a boolean value (true or false) and you find that 99% of the time the response is "true" but it takes 5 seconds to work this out, then... to hell with it, just return "true" and don't bother with the computation. Lazy I know.

Of course some of the time the response would be wrong but that's the price you pay for improving performance throughput.

There would be some (possibly significant) cost to determining the entropy of inputs/outputs and any function which modifies the internal state of the system (non-idempotent) should be avoided from such treatment for obvious reasons. You'd also only really want to rely on such behaviour when the system is busy and nearly overloaded already so you need a way to quickly get through the backlog - think of it like the exit gates of a rock concert when a fire breaks out, you quickly want to ditch the "check-every-ticket" protocol in favour of a "let-everyone-out-asap" solution.

You could even complicate the process a little further and employ a decision  tree (based on information gain for example) when trying to determine the response to a particular set of inputs.

So, you need to identify expensive idempotent functions, calculate the entropy of inputs and outputs, build associated decision trees, get some feedback on the performance and load on the system and work out at which point to abandon reason and open the floodgates - all dynamically! Piece of piss... (humm, maybe not).

Anyway, your program would make mistakes when under load but should improve performance and throughput overall. Wtf! Like when would this ever be useful?

  • DoS attacks? Requests could be turned away at the front door to protect services deeper in the system?

  • The Slashdot effect? You may not give the users what they want but you'll at least not collapse under the load.

  • Resiliency? If you're dependent on some downstream component which is not responding (you could be getting timeouts after way too many seconds) then these requests will look expensive and the fallback to some default response (which may or may not be correct!?).


Ok, perhaps not my best idea to date but I like the idea of computers making mistakes by design rather than through incompetence of the developer (sorry, harsh I know, bugs happen, competent or otherwise).

Right, off to take the dog for a walk, or just step outside then come back in again if she's feeling tired...

 

Comments

Popular posts from this blog

An Observation

Much has changed in the past few years, hell, much has changed in the past few weeks, but that’s another story... and I’ve found a little time on my hands in which to tidy things up. The world of non-functionals has never been so important and yet remains irritatingly ignored by so many - in particular by product owners who seem to think NFRs are nothing more than a tech concern. So if your fancy new product collapses when you get get too many users, is that ok? It’s fair that the engineering team should be asking “how many users are we going to get?”,   or “how many failures can we tolerate?” but the only person who can really answer those questions is the product owner.   The dumb answer to these sort of question is “lots!”, or “none!” because at that point you’ve given carte-blanche to the engineering team to over engineer... and that most likely means it’ll take a hell of a lot longer to deliver and/or cost a hell of a lot more to run. The dumb answer is also “only a couple” and “

Inter-microservice Integrity

A central issue in a microservices environment is how to maintain transactional integrity between services. The scenario is fairly simple. Service A performs some operation which persists data and at the same time raises an event or notifies service B of this action. There's a couple of failure scenarios that raise a problem. Firstly, service B could be unavailable. Does service A rollback or unpick the transaction? What if it's already been committed in A? Do you notify the service consumer of a failure and trigger what could be a cascading failure across the entire service network? Or do you accept long term inconsistency between A & B? Secondly, if service B is available but you don't commit in service A before raising the event then you've told B about something that's not committed... What happens if you then try to commit in A and find you can't? Do you now need to have compensating transactions to tell service B "oops, ignore that previous messag

Equifax Data Breach Due to Failure to Install Patches

"the Equifax data compromise was due to their failure to install the security updates provided in a timely manner." Source: MEDIA ALERT: The Apache Software Foundation Confirms Equifax Data Breach Due to Failure to Install Patches Provided for Apache® Struts™ Exploit : The Apache Software Foundation Blog As simple as that apparently. Keep up to date with patching.