Clustering Tomcat is Easier than People Think
JGroups is all the rage for server clustering in the Java world at the moment. It is also a pain in the butt at times. Tomcat does something like it with its internal multicast replication (backported from 5 to 4), but it is rather dificult to get working, and finicky once it does work.
So, instead of fighting with multicast in-memory replication remember that jk2 does server-affinity very nicely with Tomcat behind it and use a database to store session information. It works disturbingly well, and is much easier to configure.
It's quite easy to do this, lets look at a typical <Context /> element with a database backed session manager:
The difference between this and other contexts is the <Manager /> element. It specifies a database to store sessions in. It requires a pretty basic schema:<Context path="" docBase="/usr/local/my-web-app" debug="0"> <Manager className="org.apache.catalina.session.PersistentManager" debug="0" saveOnRestart="true" maxActiveSessions="-1" minIdleSwap="30" maxIdleSwap="600" maxIdleBackup="0"> <Store className="org.apache.catalina.session.JDBCStore" driverName="org.postgresql.Driver" connectionURL="jdbc:postgresql://db.example.com/tomcat?user=user&password=password" sessionTable="tomcat_sessions" sessionIdCol="id" sessionDataCol="data" sessionValidCol="valid" sessionMaxInactiveCol="maxinactive" sessionLastAccessedCol="lastaccess" checkInterval="60" debug="0" /> </Manager> </Context>
This can sit in pretty much any database. MySQL in uberfast mode (ie, no transactions) is a good choice as session storage is inherently single-threaded. I use Postgres because I like Postgres =)tomcat=# \d tomcat_sessions Table "public.tomcat_sessions" Column | Type | Modifiers -------------+------------------------+----------- id | character varying(100) | not null valid | character(1) | not null maxinactive | integer | not null lastaccess | bigint | data | bytea | app | bytea | Indexes: tomcat_sessions_pkey primary key btree (id) tomcat=#
Now the fun part, making it really useful for a cluster. I will start with the presumption that we are using a small cluster, say four app servers: sam, frodo, pippin, and merry; two http sprayers: morgul and isengard; and a database server: db (I know, it breaks the naming convention).
The sprayers are running Apache HTTPD with jk2 and the following entries for each app server in worker2.properties:
s/frodo/$server_name/g for each additional app server. The[channel.socket:frodo.example.com:8009] info=Ajp13 forwarding over socket debug=0 group=lb tomcatId=frodo
tomcatId
is important.
Each app server needs to modify its <Engine />
element to include a jvmRoute
attribute:
Works for frodo. The<Engine name="Standalone" defaultHost="localhost" debug="0" jvmRoute="frodo">
jvmRoute
attribute needs to match the tomcatId
for that server in the jk2 configuration. In addition, each app server needs the earlier described session manager.
Round robin the external DNS between the HTTPD instances. They are stateless for their affinity -- all the information they need is contained in the JSESSIONID. If the app server it wants to send the request to is down it picks another at random. That app server relaizes it doesn't have the session and you get a database hit to retrieve the session. This only happens when a server goes down, however. Normally you will have full session affinity
The problem with this is a database update per request. This can really put the crimp on an already bogged down database server. However it is easily solved - use a seperate database on a seperate network. Each app server will have two or three NIC's and be on two or three networks. The first is the network between the app servers and the sprayers. The second is a network between the app servers and the session database. The third is an administration network. This is overkill, but hey, nics and switches are cheap.
This setup will handle a lot of load and the chokepoint will become the RDBMS normally. C-JDBC might be the cost effective choice to solve that problem, but I don't trust it enough yet for production use. It may be there, but my understanding of it and faith in it isn't yet. Postgres's recently released replication server can provide failover capabilities, but it still means you need a big performant database. Anyone have a good solution for that?