Brian's Waste of Time

Thu, 15 May 2008

Topology Aware Consistency Policies

I am increasingly fascinated by consistency options, in a distributed storage system, made available by topology awareness on the client. For example, if you consider a write committed iff the write has been made to a majority of all storage nodes and a majority of the local nodes, where local would typically be "same datacenter," it allows you to achieve repeatable read read what you wrote consistency locally when a majority of local nodes have responded to a read request with a matching response, while still providing overall consistency across the entire system.

0 writebacks [/src] permanent link

Sat, 10 May 2008

Long Tail Treasure Trove Slides!

Gianugo has posted the slides from our JavaOne presentation, on Slideshare and in pdf form. The talk was awesome to give, we had a killer audience. A huge thank you to all who attended!

2 writebacks [/src/java] permanent link

Wed, 23 Apr 2008

The Shape of Async Callback APIs

When we have async callbacks in a Java API, the idiommatic way of writing the interface to register the callback looks like:

Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
})

I'd like to propose that we adopt a new idiom, which is to pass an Executor along with the listener:

Executor myExecutor = Executors.newSingleThreadExecutor();
// ...
Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
}, myExecutor);

The main benefit is that you give the caller control over the threading model for the callback. Right now, most libraries either have a separate thread pool for callbacks, or make the callback on the event generator thread. Usually there is nothing but an obscure reference on a wiki to indicate the behavior.

2 writebacks [/src/java] permanent link

Tue, 15 Apr 2008

If We Had to Drop Java

So, thought experiment. If we, as an industry, had to drop Java, the language and the virtual machine, for some reason, what could really move into its niche?

Some points to consider:

Putting aside the "damn I want to use coolness X," what out there provides something that could do it?

4 writebacks [/src] permanent link

Sun, 16 Mar 2008

mod_wombat and the GSoC

Nathan wrote up a great blog post about thoughts for working on mod_wombat (Lua in Apache) for this coming Google Summer of Code. I'd be extremely excited (along with Nathan and Matthew, I suspect) to mentor someone on it if it sounds exciting to folks out there :-)

0 writebacks [/src/wombat] permanent link

Sat, 01 Mar 2008

Revisiting Groovy

I haven't actually used Groovy much since, oh, egads, umh, 2004. It was at 1.0b6 and was taking a direction which I both disagreed with and found kind of boring. It was throwing away large chunks of dynamicity for some performance gains as it decided it really wanted to be compiled, after all. Large chunks of the new syntax I also disagreed with so... I wandered away, wishing everyone luck.

Well, funny things can happen in, er, three and a half years, so when a coworker suggested we look at Groovy for solving a problem my initial reaction was "erk, umh, I kind of like our use of JRuby for that" but Groovy wasn't even at 1.0 when last I used it, so it was a pretty unfair reaction. Looking again, it is at 1.5.4! Time to revisit!

After noting that none of my old code parsed, I worked through the tutorial. This isn't the same language I last used. It smells like Perl or PHP more than Ruby, which it rather resembled back then. Overall, my second "first" impression: totally practical, rather ornery. Will dig into it more.

0 writebacks [/src/groovy] permanent link

Wed, 27 Feb 2008

Method Chaining

A coworker commented to me today "what's up with all these libraries that encourage method chaining? ;-)" when we were talking about FEST. To stay in context, we are talking about this kind of thing:

assertThat(yoda).isInstanceOf(Jedi.class)
                .isEqualTo(foundJedi)
                .isNotEqualTo(foundSith);

This, of course, has also been called nice things like "train wreck" and is frequently seen to be a brittleness inducer in code. On the other hand, I encourage the heck out of it in libraries I write, from jDBI for example:

handle.prepareBatch("insert into something (id, name) values (:id, :name)")
        .add(1, "Brian")
        .add(2, "Keith")
        .add(3, "Eric")
        .execute();

On yet another hand, I pointed out that it was a bad practice to someone in a code review just last week. So, when is it a good fluent interface, and when is it a train wreck? Good question. My first reaction is "I know it when I see it" but that isn't very useful. So, to take a stab at a description...

Method chaining makes a good interface when the chained methods all come from the same module, are part of the published API, and when taken together represent a single logical action. In the first example, they are all on the published interface of FEST-Assert and are asserting that yoda is correct. In the second, they all come from the published interfaces of jDBI and form one batch statement.

For a negative example, let's take data access traversal:

yoda.getMidiclorians().getForce().getDarkSide().getLightningFromFingers();

Here, even if the interfaces for all the intervening classes are in the same module, and are very stable, it sure as heck isn't a single logical unit.

Anyway, gotta run, lunch is done. If I think of a better way to describe it will do so this evening!

5 writebacks [/src] permanent link

Sun, 24 Feb 2008

Learning, Programming, Etc.

A happy coincidence of The Praggies and O'Reilly both doing bookamajigs focused on general, programmery, learning. O'Reilly's is an interesting take in that it is a collaborative, wiki-based venture. Andy Hunt's is triply interesting to me as I did my graduate work on the stuff he is writing about, if in a very different context (formal education).

Refactoring Your Wetware starts out with a nice review of the Dreyfus model (I grabbed the beta book) but is still mostly not-yet-written, so Andy's approach to progressing through the stages isn't clear, yet. I'm very much looking forward to seeing how he approaches and presents the long view of learning.

The O'Reilly approach hits close to home for me as I spent a lot of time experimenting with material from the Portland Pattern Repository when I transitioned back into programmering from teaching and realized I didn't actually remember much! Anything that helps self-taught folks get better is teh win.

0 writebacks [/src] permanent link

Sat, 02 Feb 2008

AtomSub

So AtomPub is a reasonable way to publish things, etc. Would be nice to push an AtomPub endpoint to a service as a callback for events. An awfully large number of things can accept HTTP now, and there is a reasonable basic-operation system available, so why not take advantage for callback APIs? Instead of polling a site for updates, post a subscription with an AtomPub endpoint as the entry and let the service push to you. AtomSub :-)

3 writebacks [/src] permanent link

Sat, 26 Jan 2008

an interesting milestone: mod_slow

Crossed some kind of threshold today, I am sure. I needed a quick'n'dirty web server hack so broke out C for an apache module! What is happening to me?!

Basically, I needed something to put behind a proxy to do some load and capacity testing of the proxy. As I wanted to have things like the size of the response and time of the response be easily configurable on the load generator I needed to hack something up...

#include "httpd.h"
#include "http_config.h"
#include "http_protocol.h"
#include "ap_config.h"
#include "apr_time.h"
#include "apr_strings.h"

static int handler(request_rec *r)
{
    if (r->args)
        apr_sleep(apr_atoi64(r->args) * 1000);
    return DECLINED;
}

static void register_hooks(apr_pool_t *p)
{
    ap_hook_handler(handler, NULL, NULL, APR_HOOK_MIDDLE);
}

module AP_MODULE_DECLARE_DATA slow_module = {
    STANDARD20_MODULE_STUFF, 
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    register_hooks
};

This very nicely lets me drop artificial slowdowns in front of the the default handler (serve up files) so I can control "processing time" and file size (pick the file with the size I want): http://binky/big.html?2000 Sweet! Am kind of floored that the first solution which leapt to mind for me was an apache module in C, though!

For some reason, putting the sleep in fixups doubled the sleep time, so I made it a declined handler and things worked fine. Need to figure out why.... someday.

2 writebacks [/src/apache] permanent link

Mon, 21 Jan 2008

IO Heresy

A recent thread on the Apache HTTPD development mailing list reminded me of something funny. Orthodox Server Programmerism states that events are better than threads. Funny thing is that at the same time this meme has finally broken into the mainstream (the last several years) it has become largely irrelevant. Even better, it is only going to become more and more irrelevant as time passes.

OS kernels, even ones that love events and hate threads now do threads very efficiently. On top of that, many-processor, multi-core computers are the norm (heck, my laptop is dual-core) and this trend is going to increase very quickly.

I suspect events vs. threads is going to go the way of Assembly vs. C (or C vs. Java (or Java vs. The P(+R) gang)). Sure you can theoretically optimize the former better, but the theoretically will be operative in the vast majority of cases. Heck, I hope thread schedulers use events when a thread blocks on IO. Presuming they do, the penalty for a thread vs an event listener should be the thread stack and restoring registers. The thread stack is a specific solution to storing the context for handling the event, and I don't know close to enough to dive any deeper :-)

Heck, anyone designing a new system for embarassingly concurrent stuff today would probably be better off solving this in the compiler and then exposing linear programming to the programmer via explicit happens-before semantics rather than a thread model.

2 writebacks [/src] permanent link

Mon, 14 Jan 2008

Nu is Sweet!

Ran across Nu today in a reference from Brandon Warner. Nu is an interpreted lisp dialect with close ties to ruby and objective-c. Really :-)

The best way to illustrate this is probably to look at a snippet, in this case from the nuke tool bundled with Nu:

(unless @prefix
        (set @prefix 
             "#{((((NSProcessInfo processInfo) arguments) 0) dirName)}.."))

(unless @icon_files 
        (set @icon_files 
             (array "#{@prefix}/share/nu/resources/nu.icns")))

I won't point out the objc and ruby bits therein, as if you know one or both, you see them. It looks weird in places, but if you want to hack around on cocoa stuff, wowzers, it rocks. Check the converted form of ye olde currency converter (the first bit of cocoa programming for a lot of folks, myself included).

0 writebacks [/src] permanent link

Sat, 29 Dec 2007

Autotools are the Devil

Sadly, mod_wombat is stalled because I cannot figure out autotools, nor can I find anyone who understands autotools. Apparently everything is built by copying the macros from something else and tweaking them.. and praying.

I am coming around to building it with SCons, or if Jan finishes it, his Lua based autotools replacement. I am pretty sure waiting for him to build it, them learning it, will be faster than making autotools do what I want (ie, work).

4 writebacks [/src] permanent link

Fri, 21 Dec 2007

Loving Scala: Scripting

So, I generally have two classes of scripting needs: systemy/unixy stuff and systemy/application stuff. Most of the applications I work with in $dayjob are Java. Over the last year or two that means I use Ruby (MRI or JRuby) as the case may be. Tonight, prodded by the new book I broke out Scala for doing a bunch of in-REPL statistics crunching using the wonderful commons-math library.

I love it. Don't know if I love it enough to replace JRuby for typical Java-interaction hackery, but enough that I am going to switch back and forth for a while so I can decide.

Hadn't actually used scala in a few years, so is great to break it out again!

6 writebacks [/src/scala] permanent link

Wed, 12 Dec 2007

Shindig: An OpenSocial Container

So a bunch of folks are contributing to set of open source implementations of OpenSocial at Apache, called Shindig. There isn't much of a site yet but there is a pile of working code! Much more information over on Google's OpenSocial Blog.

2 writebacks [/src] permanent link

Fri, 07 Dec 2007

Re: JCE and OpenSSL

Thank you Julius Davies! for not-yet-commons-ssl which does exactly what I was looking for

import org.apache.commons.io.IOUtils;
import org.apache.commons.ssl.OpenSSL;

import java.io.File;
import java.io.FileOutputStream;

public class Foo
{
    public static void main(String[] args) throws Exception
    {
        File f = new File("/tmp/foo");
        FileOutputStream fout = new FileOutputStream(f);
        fout.write(OpenSSL.encrypt("aes256", 
                                   "secret".toCharArray(), 
                                   "hello world\n".getBytes("UTF-8")));
        fout.close();
        Process p = Runtime.getRuntime()
            .exec("openssl enc -pass pass:secret -d -aes256 -a -in /tmp/foo");
        System.out.print(IOUtils.toString(p.getInputStream()));
    }
}

Woo hoo!

3 writebacks [/src/java] permanent link

Thu, 29 Nov 2007

Shedding

Shedding is a technique Paul, John, and I hashed out (during the recent ApacheCon hackathon) for service lookup in pure HTTP. The problem it tries to solve is resolving where to make a request to if you have a URL, say http://storage.service/session which is a backend service for managing session state in Hypothetical Incorporated.

With shedding you set up DNS to send all traffic on the .service. domain to a set of shed servers. The shed servers serve up HTTP and respond to all requests with a 302 to the correct location, so our request above would look like:

  --- Request ---                 --- Response ---
  
POST /session HTTP/1.1
Host: storage.service
Content-Length: 0


                                HTTP/1.x 302 Moved
                                Location: http://10.0.1.5/session


POST /session HTTP/1.1
Host: 10.0.1.5
Content-length: 0


                                HTTP/1.x 201 Created
                                Location: http://storage.service/1234
                                Content-length: 0
                                
                                

The shed server basically "sheds" the requests off to other servers. Now, creating this session storage required an extra request to hit the shed server. Ick. We can narrow the definition of the 302 response for internal services though to apply to just the host moving, though. This feels kind of dirty at first, but it gives us a very interesting benefit when combined with long-lived HTTP connections (the kind you use for internal services) -- you can maintain a pool of connections for a hostname obtained from following the redirect chain.

As you accumulate connections you wind up building a pool attached to the different servers that the sheds redirect you to. The shed behavior knowledge becomes built into the connection pool rather than the client. The client just asks for a connection to storage.service and it gets one that points to something capable of servicing it. If there is no connection available it can create a new one.

So, given a client which knows about the narrowed definition of 302, you pay no penalty, for a client which does not you make an extra request but get correct behavior. Even for the client which doesn't know about the sheds, they should be maintaining long lived connections to both the shed server and to the service instance, so you at least avoid the handshakes. Not shabby.

So, that is the client view, let's see the shed view. For resiliency the sheds should be allocated in replicated clusters, probably using an eventual-consistency mechanism for maintaining their knowledge of the state of the world. I suspect that a fast-read version of a paxos database between them, where reads can be executed against any node and do not get serialized into the state machine, but only get the view as that instance sees it, would work fine in practice. Or not, I am tired and this is irrelevant to shedding itself :-)

So the group of shed servers would serve a set of servers arranged into a tree of sheds, as in this logical diagram. The bottom level sheds, [A B C] and [X Y Z] we'll call Line Sheds. The top tier ones, [M N O] we'll call Master Sheds.

The servers serviced by a set of sheds need to announce their availability to the sheds somehow. Various mechanisms can be used, for now it is irrelevant. The line sheds learn about the services using those line sheds. The line sheds will direct traffic between those instances, providing good locality of service in the common case.

If a line shed doesn't know about any instances of a service it punts up to the master sheds via a 302. The master sheds then 302 to a different set of line sheds. An example, following our logical diagram with a request originating from Server 1 sent to Shed A

      --- Request ---                 --- Response ---

    POST /session HTTP/1.1
    Host: storage.service
    Content-Length: 0
    X-Comment: To Shed A


                                    HTTP/1.x 302 Moved
                                    Location: http://[shed m]/session
                                    X-Comment: From Shed A


    POST /session HTTP/1.1
    Host: [shed m]
    Content-length: 0


                                    HTTP/1.x 302 Moved
                                    Location: http://[shed y]/session
                                    X-Comment: From Shed M


    POST /session HTTP/1.1
    Host: [shed y]
    Content-length: 0
    
    
                                    HTTP/1.x 302 Moved
                                    Location: http://[server 79]/session
                                    X-Comment: From Shed Y

    POST /session HTTP/1.1
    Host: [server 79]
    Content-length: 0


                                    HTTP/1.x 201 Created
                                    Location: http://storage.service/1234
                                    Content-length: 0
                                    X-Comment: From Server 79
                                    

This redirect chain is obviously not ideal, but it will only be followed once (per connection establishment), after that the connection will be pooled and subsequent requests to storage.service will hit Server 79 directly.

To handle this, line sheds need to tell the master sheds each service type they know about so that the master sheds can properly send delegated redirects. The protocol needs to ensure that a "I have no more X" message has made it to masters BEFORE a delegated redirect is sent up or it may be delegated back to the one delegating!

To put this scenario in useful context, consider the following physical layout to correspond to the logical one used this far. We have two datacenters and we want to service requests in the same one, if possible, but fail over to the remote if need be. We distribute the line sheds between racks in each datacenter, and distribute the master sheds between datacenters.

But wait, hasn't this problem been solved? Let's revisit the parameters. We want to solve this for internal services, not for the web. We want to take advantage of service locality (pick service instances nearest the source of the request). We want to be able to very rapidly add and remove instances of a service. We want to, as rapidly as possible stop routing requests to dead instances. Finally, we want to minimize load on the network in general.

The first obvious answer is to just use DNS like Paul intended. Sadly, DNS kind of sucks for this when you have a changing service landscape. TTLs less than a minute tend to be ignored so a dead instance will be in rotation up to a minute, new instances will only be picked up after a minute. Using a one minute TTL leads to a lot of extra traffic in the steady state case. The only "load balancing" facility (without going into a custom DNS client which consumers SRV records) is straight round-robin. Clients have differing behaviors with regards to round-robin DNS to top it off. You can get locality by only advertising local services, but then if the local services go tits up, you don't make use of more remote services. Basically, DNS can be made to work, but not as well as I want.

Okay, how about pointing DNS at a load balancer? This works pretty well. The load balancer then becomes your bandwidth limiting factor and bandwidth through a load balancer is way pricier than through a switch. That said, if you have deep pockets, score!

Fine, how about using a directory server instead of DNS? This would be the Orthodox Java Way (JNDI), the Orthodox Microsoft Way (AD), and most directory/LDAP implementations update really quickly, are very read optimized, etc, etc. You lose use of your URLs though. This is fine in an RPC/CORBA/RMI/Thrift world, sucks if you are in HTTP land though.

Given the constraints outlined, I think shedding has some merit. Feedback, particularly on why this could never work, is extremely appreciated!

8 writebacks [/src] permanent link

Sat, 24 Nov 2007

Interactive (Web) Application Architecture Patterns

Well worth reading. I'll be sending this link to people when they first get all excited about MVC so that we can use the same terms and references.

0 writebacks [/src] permanent link

Thu, 15 Nov 2007

mod_wombat talk

Slides from my talk on mod_wombat :-) at ApacheCon. Talk went well, but I tried to squeze in too much material. Had pointed out to me that mod_perl isn't as good as I thought it was, it needs multiple interpreters in threaded MPMs. mod_wombat is actually way more useful and important than even I thought as that leaves... nothing except maybe mod_tcl for doing the micro-module stuff in worker or event.

1 writebacks [/src/wombat] permanent link

Wed, 17 Oct 2007

The Far End of the Long Tail of Itches

There is a 3 member community for using GNU Emacs as a video editor...

1 writebacks [/src] permanent link