Brian's Waste of Time

Mon, 25 Sep 2006

Working With the Grain

A good while back I investigated the various ways to do FastCGI on Apache 2.X. Along the way I started (finally) learning my way around writing Apache HTTPD modules. It has been a great experience, though frustrating at times (mostly dealing with makefiles).

So, off the bat, if you want to use FastCGI with Apache 2.X right now use mod_fcgid. It works great. In the (hopefully near) future mod_proxy_fcgi will be released as part of the main httpd distribution (I didn't write it, so it probably works!).

Okay, all of that said, it was an incredibly frustrating experience for a couple reasons. The real goal was to find a good way to run Rails and Django and TurboGears and whatnot on Apache HTTPD. Should be trivial, right? Sadly, it is not. Nor is it trivial to do it on LightTPD, nor much else. The only easy ways to run rails are on webrick or mongrel. Seriously. Neither of these is a good option as a primary web server, which is unfortunate. The situation is similar for Python, though I don't do much python work anymore, so I'll keep my trap shut on that front ;-)

There is an obvious option for a robust general web server designed for writing web apps. To maximize it though, you need to use an mpm other than prefork. The main reason you need to avoid prefork, for ruby and python apps, is the massive memory footprint engendered by loading a ruby or python vm in each httpd process. They both have fairly hefty memory requirements for their VMs, so having lots of processes (which can work fine for small VM systems, like PHP) gets painful disturbingly quickly. You need prefork to run them in process because they both have, basically, broken behavior with regard to posix threads.

The alternative to having a VM per httpd process is to have a pool of processes and play games with various protocols to hand off the request to the the scripting VM. The aforementioned FastCGI is a popular option, just proxying the HTTP is another popular one, AJP is yet another, as is SCGI. This is what we all wind up doing not just in Ruby and Python, but also in Java (where AJP dominates). This certainly works, but it would be nice to not add the network (or domain socket) hop if we could help it.

So, while learning my way around HTTPD's internals (which are very nice, if poorly documented) I mostly mucked with approaches to making Ruby work in conjunction with httpd in a less kludgy way (proxing out to another server is kludgy). I really wanted to be able to execute the ruby in-process so that it could hook into all the httpd processing steps, a la the existing mod_ruby, but this was a royal pain because of ruby's threading borkiness (ruby uses green threads). Even if it didn't have its threading borkiness, it would mean bleed-over of state between handlers (loaded libraries, etc), which in a general purpose server is not a good thing.

The basic problem, in my opinion, is that ruby (and python) both like to be the language. You extend them, you don't embed them. You use ruby to drive C, not C to drive ruby. Extending ruby is fun and easy. Extending httpd with ruby is pain. Anything can be worked around, of course, but it feels like just that, work. You are fighting the grain of the problem, and basically using a great general purpose web server and application platform as an HTTP scrubber, cache, and static resource server.

So, in my copious free time I have been trying to find a good way that works with httpd rather than against it. I really do believe that scripting languages are the best way to glue most webapps together, so I want to keep that. In order to work nicely with httpd the language has to be fairly agnostic with regard to running in a multithreaded server (or even a single thread for everything server like lighttpd (I think)). So, I go looking for a scripting language that does play nicely therein (Ruby doesn't -- I love ruby, I use ruby daily, but it is crap for embedding in an embarrassingly concurrent, multithreaded server).

It is surprising how bad the state of robust, thread-friendly scripting languages is. I found three (and a half) so far that look like they'll fit my needs. It is a surprising list until you think about it. They are Lua, JavaScript (SpiderMonkey), and TCL. The half is Ficl. I am not sure it works yet, but it should. It should also give me lots of headaches being Forth based :-)

The shocking part of this list is that there is not a single Scheme on it. I looked at embedding Guile, Gauche, Chicken, Elk, MZScheme, Scheme48, and a couple others whose names escape me. They all (except chicken, which requires a compilation) use global state. Sometimes this is just for function declaration, but this still causes pollution of other interpreter's namespaces, so doesn't work for this situation. I was, I really was, shocked by this. I figure there has to be a good one (for embedding in this way) out there, but I haven't found it yet. I digress.

The "going with the grain" thing applied to looking at scripting languages as well. You could almost guess at first glance at the docs which languages would work well for this and which wouldn't. There is a definite grain to languages. Ruby and Python, for example, are both used to being the primary language in a system (or process, at least). You work in Ruby and call out to C or OCaml. On the other hand, looking at SpiderMonkey or Lua, it is immediately clear that they expect to be embedded in something else.

There is a big benefit to working with something that sees itself as the first class citizen -- they tend to have better^w more extensive libraries. For libraries, Perl > PHP > Python > Ruby > TCL > Lua > SpiderMonkey. That said, C and Java have them all beaten, and hooking C into Lua, at least, is even more trivial than with Ruby or PHP. It just means embracing C, which is something people have been working to escape. C is really good at slinging bits around on *nixy systems though. Really, really good. Pretty much better than anything else, even if it is not as nice on developers as other languages/environments.

So, where does this ramble lead? I have been playing around a fair bit with Lua + C libraries + httpd (modules) based webapp development. I really like it. It feels like things fit together as a system rather than just being a mixture of internally consistent, but externally headbutting, systems. I don't have a great way of talking to databases yet. I am looking forward to ApacheCon and trying to figure out how to take advantage of the changes in the apache 2.4 event mpm (so you can disconnect the response handling from the service thread and go non-blocking the whole way) and whatnot. I can say one thing, it is fun as hell getting used to the order of magnitude performance increase working in-process on httpd compared to java or ruby.

12 writebacks [/src] permanent link