Creating applications than can scale horizontally is; in theory, pretty simple. Processing must be parallelizable such that the work can be split amongst all member processors and servers in a cluster. Map-reduce is a common pattern implemented to achieve this. Another; even more common, pattern is the simple request-response mechanism of the web. It may not sound like it since each request is typically independent from each other, but from a servers perspective it is arguably an example of parallel processing. Map-reduce handles pre-requisites by breaking jobs down into separate map and reduce tasks (fork and join) and chaining multiple map-reduce jobs. The web implements it's own natural scheduling of requests which must be performed in sequence as a consequence of the wet-ware interacting at a snails pace with the UI. In this case any state needing to be retained between requests is typically held in sessions - in-memory on the server.
Resiliency though is a different issue than scalability.
In map-reduce, if a server fails then the processing task can be restarted on another node. They'll be some repeat work performed as the results of the in-flight task will have been lost (and maybe more) but computers don't much mind doing repetitive tasks and will quite willingly get on with it without much grumbling (ignoring the question of "free will" in computing for the moment).
Humans do mind repeating themselves though (I've wanted to measure my reluctance to repeat tasks over time since I think it's got progressively worse in recent years...).
So how do you not lose a users session state if a server goes down?
Firstly, you're likely going to piss someone off. They'll be some request in mid flight the second the server does down unless you're in maintenance mode and are quiescing the server cleanly. Of course you could not bother with server session state at all and track all data through cookies running back and forth over the network. This isn't very good - lot's of network traffic and not very secure if you need to hold anything the user (or Eve) shouldn't see, or if you're concerned about someone spoofing requests. Sometimes it's viable though...
But really you want a way for the server to handle such failures for you... and with WebSphere Application Server (WAS) there's a few options (see how long it takes me to get to the point!).
==== SCROLL TO HERE IF YOU WANT TO SKIP THE RATTLING ====
The WAS plugin should always be used in front of WAS. The plugin will route requests to the correct downstream app-server based on a clone id tagged on to the end of the session id cookie (JSESSIONID). If the target server is not available (plugin cannot open a connection to the server) then another will be tried. It also means that whatever http server (Apache, IIS, IHS) a request lands on it will be routed to the correct WAS server where the session is held in memory. It's quite configurable for problem determination; on the fly, so well worth becoming friends with.
When the request finally lands on the WAS server then you've essentially three options for how you manage sessions for resiliency.
- Local Sessions - Do nothing and all sessions will be held in memory on the local server. In this instance, if the server goes down, you'll lose the session and users will have to login again and repeat any work they've done to date which is held in session (and note; as above, users don't like repeating themselves).
- Database persistent sessions - Configure a JDBC source and WAS can store changes to the session in a database (make sure all your objects are serializable). The implementation has several options to optimize for performance over safety and the like but at the end of the day you're writing session information to a database - it can have a significant performance impact and adds another pre-requisite dependency (i.e. a supported, available and resilient database). Requests hitting the original server will find session data available in-memory already. Requests hitting another server will incur a database round trip to fetch session state. As a one-off hit it's tolerable but to avoid repeated DB hits you still want to use the plugin.
- Memory to memory replication - Here changes to user sessions are replicated;in the background, between all servers in a cluster. In theory any server could serve requests and the plugin can be ignored but in practice you'll still want requests to go back to the origin to increase the likelihood that the server has the correct state as even memory-memory replication can take some (small) time. There are two modes this can operate in, peer-to-peer (normal) and client-server (where a server operates as a dedicated session state server).
My preference is for peer-to-peer memory-to-memory replication due to performance and cost factors (no additional database required which would also need to be resilient, no dedicated session state server). Details of how you can setup this up are in the WAS Admin Redbook.
Finally, you should always keep the amount of data stored in session objects to a minimum (<4kB) and all objects need to be serializable if you want to replicate or store sessions in a database. Don't store the complete results of a cursor in session for quick access - repeat the query and return only the results you want (using paging to skip through) - and don't store things like database connections in session, it won't work, at least, not for long...