2014/05/31

Mad Memoization (or how to make computers make mistakes)

Memoization is a technique used to effectively cache the results of computationally expensive functions to improve performance and throughput on subsequent executions. It can be implemented in a variety of languages but is perhaps best suited to functional programming languages where the response to a function should be consistent for a given set of input values. It's a nice idea and has some uses but perhaps isn't all that common since we tend to design  programs so that we only call such functions once; when needed, in any case.

I have a twist on this. Rather than remembering the response to a function with a particular set of values, remember the responses to a function and just make a guess at the response next time.

A guess could be made based on the entropy of the input and/or output values. For example, where the response is a boolean value (true or false) and you find that 99% of the time the response is "true" but it takes 5 seconds to work this out, then... to hell with it, just return "true" and don't bother with the computation. Lazy I know.

Of course some of the time the response would be wrong but that's the price you pay for improving performance throughput.

There would be some (possibly significant) cost to determining the entropy of inputs/outputs and any function which modifies the internal state of the system (non-idempotent) should be avoided from such treatment for obvious reasons. You'd also only really want to rely on such behaviour when the system is busy and nearly overloaded already so you need a way to quickly get through the backlog - think of it like the exit gates of a rock concert when a fire breaks out, you quickly want to ditch the "check-every-ticket" protocol in favour of a "let-everyone-out-asap" solution.

You could even complicate the process a little further and employ a decision  tree (based on information gain for example) when trying to determine the response to a particular set of inputs.

So, you need to identify expensive idempotent functions, calculate the entropy of inputs and outputs, build associated decision trees, get some feedback on the performance and load on the system and work out at which point to abandon reason and open the floodgates - all dynamically! Piece of piss... (humm, maybe not).

Anyway, your program would make mistakes when under load but should improve performance and throughput overall. Wtf! Like when would this ever be useful?

  • DoS attacks? Requests could be turned away at the front door to protect services deeper in the system?

  • The Slashdot effect? You may not give the users what they want but you'll at least not collapse under the load.

  • Resiliency? If you're dependent on some downstream component which is not responding (you could be getting timeouts after way too many seconds) then these requests will look expensive and the fallback to some default response (which may or may not be correct!?).


Ok, perhaps not my best idea to date but I like the idea of computers making mistakes by design rather than through incompetence of the developer (sorry, harsh I know, bugs happen, competent or otherwise).

Right, off to take the dog for a walk, or just step outside then come back in again if she's feeling tired...

 

No comments:

Post a Comment

Voyaging dwarves riding phantom eagles

It's been said before... the only two difficult things in computing are naming things and cache invalidation... or naming things and som...