Brian's Waste of Time

Sat, 18 Oct 2003

Sessions on the Client

Bill de hOra posted an article on scalability on the web and how the current mode of storing session infomration on the server is broken. I have joked before about writing a session manager that serializes the whole HttpSession and plugs it into a hidden form field, or in an array of cookies so that it can be done independently of content.

This absurd idea is becoming more promising as time passes. As Bill points out, web applications will never scale (concurrent user limit increase linearly against amount of hardware) really high unless state is moved off of the the server. So, I have decided to think about how to do this.

The first problem is security. Not being able to trust the session content is a pain in the butt. This is solvable via cryptography. You take a hit serializing, encrypting, decrypting, and deserializing the session state on every request but this is small potatoes compared to session replication on the eBay/Yahoo!/Google scale webapps.

The second problem is size - it is *really* nice to keep a reference to the logged in user, the shopping cart, the history, whatever in the session. This is typically a convenience to avoid re-materializing it from the database, but it is a major convenience as database hits are expensive (or rather, massively scalable clustered databases are expensive) so we want to avoid making that hit if we can. No way around this, we need to minimize the volume of data stored in the session until everyone has T3 sized broadband.

Bandwidth is a big problem here. You pay for bandwidth. The thing is that the bandwidth available can scale up linearly - it is just expensive. You need a revenue model where you make enough on the traffic you build for to pay for the increased bandwidth. No way around that. You can help some by compressing the serialized session information before encrypting it. This puts even more load on the servers, but servers are cheap and session is fully replicated much easier now.

That's it. Those are the drawbacks. Serialize all session data, encrypt it, chunk it into cookie sized pieces, send/retrieve the cookies, decrypt it again. The major drawback comes from accessing the data that is *not* session oriented -- prices, products, etc. You still need fancy database clustering techniques (reads from an array of slaved databases, writes to a smaller cluster of write-enabled master databases) etc. Still, it is a step up in some situations. Hmm, oh yeah, guess you need cookies enabled. Maybe I'll take a stab at making this at the ApacheCon hackathon.

3 writebacks [/src] permanent link