Rusko (don't have a better name for him) got me thinking the other night by doing the dastardly deed of pointing nevow out and discussing why apache (web server) sucked (his words, not mine (for throughput-oriented, highly static, sites)).
So, here is a stab at a design to maximize network throughput. It makes use of some things that don't exist in Java (yet, hopefully) so please excuse the rubyish pseudocode.
We start with a simple looping process:
class Looper
def initialize
@loop = Loop.new
@loop << HttpAcceptor.new(@loop, 80)
end
def start
@finished = false
callcc do |escape|
@loop.cycle(:http) do |bucket|
bucket.continue(callcc { |c| return c }) if bucket.ready?
@loop.remove(bucket) if bucket.finished?
escape.call if @finished
end
end
end
def stop
@finished = true
end
end
A couple pieces of magic there. Loop
is the first one. This is a cyclical linked list which supports internal iteration via Loop#cycle
where it is passed a cycle key, which identifies which things in the loop we care about (there may be things in the loop we don't know how to handle, this one handles http stuff, another might handle database stuff, but they can share a loop). Importantly, iteration is threadsafe (lets pretend that Ruby supports native threads, why this is important later) and while an element is "held" by an iterator, it is skipped over as if not there by any others.
The second piece of magic is the HttpAcceptor
. This just handles accepting tcp connections, and implements the (server side) http protocol. It is a protocol abstraction layer which uses asynchronous IO (NIO for the Java folks). It also implements the Bucket
protocol, which we haven't met yet. Note that it is passed the @loop
variable. When it receives a new connection it instantiates a Request
and adds it to the loop. Where in the loop doesn't actually matter, which is nice.
Now, the Request
is interesting:
class Request
def initialize(loop, content_generator)
@loop = loop
@content = content_generator
end
def continue(go_back)
@go_back = go_back
if @pickup
@pickup.call
else
content.start(self)
end
end
def return_when(key)
@loop << Bucket.new(key) { yield }
callcc do |c|
@pickup = c
@go_back.call
end
end
def finished?
content.finished?
end
def ready?
@content.ready?
end
end
The Request
is created and given a content_generator
which is much like a Java HttpServletRequest
. The Request
in this case just handles passing control around, and the content_generator
does the interesting stuff (ie, is used by people writing apps). We'll see an example of one shortly =)
The Bucket
is just a bucket implementation which is finished when the block it is passed in its initializer finishes. It responds to cycles asking for its key
.
Now, the content generator may be really simple:
class SimpleContent
def initialize(out)
@out = out
end
def start(request)
@out.puts "Hello World"
end
def finished?
true
end
end
But that is boring, lets look at something that needs to do something where it yields control to a slow process...
class InterestingContent
def initialize(out)
@out = out
end
def start(request)
@out.puts "Bouncer likes "
request.return_when(:database) { @puppy = Puppies.find_by_name 'Bouncer' }
@out.puts(@puppy.favorite_toy)
@finished = true
end
def finished?
@finished # nil == false
end
end
Now, this is much more interesting because it uses the Request#return_when
which queues out the (io bound) database call. It specifies it is a :database
operation, and does its stuff.
Now, lets follow a single thread of execution from socket connection through to response completion.
InterestingContent
generator via some means, creates an io buffer, and instantiates generator, uses that to create a Request
, drops the request in the loop, and invokes the continuation it was passed.
go_back
continuation and looks to see if it has an @pickup
instance var. It doesn't, so it starts the content generator, passing in itself.
return_when(:database) ...
call. Now this is interesting...
@go_back
continuation, which goes back to the looper...
@pickup
so it invokes it.
return_to
which promptly returns to the interesting content generator, which now has the puppy it was looking for, so it spews some more stuff to the output buffer and says its finished.
Now, this is all a single-process operation. There is a co-routine set up between the looper and the request, trading control back and forth via invoking continuations whenever an operation will not be able to execute immediately (needs network io, database io, etc). This maximizes processing time by allowing the cpu to be churning whenever there will be (an expected) io wait. The real impl will be significantly trickier as it needs to do the selecting on io streams, but hey, this is pseudo code for the concepts.
No threads, one process, way higher throughput under heavy load. Sort of scary.
Now, to go back to the loop being thread safe despite there being no threads, and Ruby supporting native threads -- on a multi-cpu machine you start one looper per cpu on the same loop instance =)
Now, what about the database? Well, two things -- you could have a dedicated database process, but this seems rather inefficient unless the database is going all the time. Instead, have a similar setup for the database, and add it to the loop as a Bucket! You can loop through the loop from a bucket in the loop, same type of coroutine as the looper/request, but is now HttpLooper/DatabaseLooper, where the http looper and the database looper are both in the loop, and trade control back and forth.
Now, we can get more fun by not making the loop an actual loop, but a priority queue instead!