Brian's Waste of Time

Wed, 30 Jul 2008

Using Virtual Nodes to Compact Vector Clocks

One hiccup encountered when using vector clocks is that there is no inherent way of reducing the size of the clock. Basically, any node which acts as a causal agent of change has the potential to be forever recorded in the clock. This leads to unbounded clock size, with time. Most systems tend to have a limited number of causal, or lead, nodes providing clock values so it is avoided, but sometimes you don't have that.

When vector clocks are used to track causality in a storage system, such as in Amazon's Dynamo system, it becomes possible to create syncronization points in the history of the element, between storage nodes, if the storage nodes are able to form consensus between themselves on the value of an element at a specific point in the elements history. If we are talking an eventually consistent system, this can be done by using a background syncronization and merge algorithm which merges acausal changes in the background. Alternately, it could be client resolved, in systems like Dynamo, but that isn't my problem, so... I digress.

When the system believes it has a value at a given clock value, where the clock is causally related to the unified value on the other storage nodes holding the element, it can try to achieve concensus about this, and if successful, increment an artifical clock key which we'll call the epoch. If successful, the epoch value subsumes the vector clock values associated with the epoch in the element, shrinking the element's clock.

To run through an example, let's say we have a system which uses three storage nodes for each element. We don't care exactly how these elements values are assigned, except to recognize that it allows for non-causally related changes to occur. At a given point in time the storage nodes may have values for an element A, as follows:

NodeClock
red[red:2, green:1]
blue[red:2, green:1, blue:2]
green[red:3, green:1, blue:1]

A Paxos instance may be executed proposing that epoch:1 be [red:2, green:1]. As each node can agree that [red:2, green:1] comes before its value, it can accept the epoch value. Upon acceptance of the value, the clocks would become:

NodeClock
red[epoch:1]
blue[epoch:1, blue:2]
green[epoch:1, red:3, blue:1]

Assuming a background reconciliation protocol, a system could apply an appropriate heuristic to decide when to atempt to increment the epoch. A good example of such would be after unrelated values have been successfully merged. When it makes sense, and how to back-off to older clock values really depends on the characteristics of the system being designed and how it will be used.

As pointed out in the Dynamo paper, systems where there tend to be a small number of keys in the clock don't generally have this problem, Dynamo avoids it by causing the clock keys to be based on a small number of likely coordinator nodes:

To this end, Dynamo employs the following clock truncation scheme: Along with each (node, counter) pair, Dynamo stores a timestamp that indicates the last time the node updated the data item. When the number of (node, counter) pairs in the vector clock reaches a threshold (say 10), the oldest pair is removed from the clock. Clearly, this truncation scheme can lead to inefficiencies in reconciliation as the descendant relationships cannot be derived accurately. However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated.

In something like that, it may not make a lot of sense -- the problem just doesn't tend to come up. On the other hand, other systems, such as one which uses a user id, or session id, as a clock key would tend to generate larger clocks. This kind of keying can be useful for providing read-what-I-wrote consistency, but that is another discussion :-)

0 writebacks [/src] permanent link

Mon, 23 Jun 2008

Library Versioning, Redux

I am a big fan of the APR versioning guidelines, but there is an element they don't capture well somewhere between major (backwards incompat change) and minor (forward incompat change) in Java. If you follow the, generally recommended practice of exposing things via interfaces (pure virtual classes), you have opened the door for users to implement those interfaces.

In a C-style world, adding a function to a library would bump you from 1.6 to 1.7, using APR guidelines. In an interface-driven Java-style world, adding a method to an interface would bump you from 1.6 to 2.0. Or would it?

To take a concrete example, a coworker (thanks Jax!) recently re-added first class support for callable statements to jDBI. jDBI uses a Handle interface to expose operations against a database. It has gained a method:

public <ReturnType> Call<ReturnType> createCall(String callableSql, 
                                     CallableStatementMapper<ReturnType> mapper);

If you implement this interface, the change is backwards incompatible. An implementation of Handle made against 2.2.2 will not compile against this. On the other hand, the intent of the library is not for people to implement Handle, it is to expose the libraries functionality. It is almost a header file.

So, 2.3 or 3.0?

3 writebacks [/src/java] permanent link

Sat, 14 Jun 2008

yasnippet

I am quite liking yasnippet so far. I have converted my bloggie stuff over from textmate to it, and it works darned nicely. Snippet definition is easy, and clear. Woot!

0 writebacks [/emacs] permanent link

Thu, 15 May 2008

Topology Aware Consistency Policies

I am increasingly fascinated by consistency options, in a distributed storage system, made available by topology awareness on the client. For example, if you consider a write committed iff the write has been made to a majority of all storage nodes and a majority of the local nodes, where local would typically be "same datacenter," it allows you to achieve repeatable read read what you wrote consistency locally when a majority of local nodes have responded to a read request with a matching response, while still providing overall consistency across the entire system.

0 writebacks [/src] permanent link

Sat, 10 May 2008

Long Tail Treasure Trove Slides!

Gianugo has posted the slides from our JavaOne presentation, on Slideshare and in pdf form. The talk was awesome to give, we had a killer audience. A huge thank you to all who attended!

2 writebacks [/src/java] permanent link

Wed, 23 Apr 2008

The Shape of Async Callback APIs

When we have async callbacks in a Java API, the idiommatic way of writing the interface to register the callback looks like:

Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
})

I'd like to propose that we adopt a new idiom, which is to pass an Executor along with the listener:

Executor myExecutor = Executors.newSingleThreadExecutor();
// ...
Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
}, myExecutor);

The main benefit is that you give the caller control over the threading model for the callback. Right now, most libraries either have a separate thread pool for callbacks, or make the callback on the event generator thread. Usually there is nothing but an obscure reference on a wiki to indicate the behavior.

2 writebacks [/src/java] permanent link

Thu, 17 Apr 2008

My Favorite Bash Completion

As I got hit with a meme about command line stuff, I figured I'd share an update to my favorite bash completion:

SSH_COMPLETE=( $(cut -f1 -d' ' ~/.ssh/known_hosts |\
                 tr ',' '\n' |\
                 sort -u |\
                 grep -e '[:alpha:]') )
complete -o default -W "${SSH_COMPLETE[*]}" ssh

If you ssh directly to IP addresses very often, you might want to leave off the last grep -e.

Not going to tag anyone, but if you have a favorite completion, please share! (I suggest not in a comment on this post as my comment system does not preserve any formatting).

5 writebacks [/stuff] permanent link

That History Thing

bakert tagged me, so:

brianm@binky:~$ history | awk {'print $2'} | sort | uniq -c | sort -k1 -rn | head
 164 svn
  52 cd
  42 ssh
  32 sudo
  22 git
  16 ls
  16 for
  14 echo
  13 man
  10 curl
brianm@binky:~$

Sadly, I only seem to keep a 500 line .history -- need to fix that.

0 writebacks [/stuff] permanent link

Tue, 15 Apr 2008

If We Had to Drop Java

So, thought experiment. If we, as an industry, had to drop Java, the language and the virtual machine, for some reason, what could really move into its niche?

Some points to consider:

Putting aside the "damn I want to use coolness X," what out there provides something that could do it?

4 writebacks [/src] permanent link

Sun, 16 Mar 2008

mod_wombat and the GSoC

Nathan wrote up a great blog post about thoughts for working on mod_wombat (Lua in Apache) for this coming Google Summer of Code. I'd be extremely excited (along with Nathan and Matthew, I suspect) to mentor someone on it if it sounds exciting to folks out there :-)

0 writebacks [/src/wombat] permanent link

Sat, 01 Mar 2008

Revisiting Groovy

I haven't actually used Groovy much since, oh, egads, umh, 2004. It was at 1.0b6 and was taking a direction which I both disagreed with and found kind of boring. It was throwing away large chunks of dynamicity for some performance gains as it decided it really wanted to be compiled, after all. Large chunks of the new syntax I also disagreed with so... I wandered away, wishing everyone luck.

Well, funny things can happen in, er, three and a half years, so when a coworker suggested we look at Groovy for solving a problem my initial reaction was "erk, umh, I kind of like our use of JRuby for that" but Groovy wasn't even at 1.0 when last I used it, so it was a pretty unfair reaction. Looking again, it is at 1.5.4! Time to revisit!

After noting that none of my old code parsed, I worked through the tutorial. This isn't the same language I last used. It smells like Perl or PHP more than Ruby, which it rather resembled back then. Overall, my second "first" impression: totally practical, rather ornery. Will dig into it more.

0 writebacks [/src/groovy] permanent link

Wed, 27 Feb 2008

Method Chaining

A coworker commented to me today "what's up with all these libraries that encourage method chaining? ;-)" when we were talking about FEST. To stay in context, we are talking about this kind of thing:

assertThat(yoda).isInstanceOf(Jedi.class)
                .isEqualTo(foundJedi)
                .isNotEqualTo(foundSith);

This, of course, has also been called nice things like "train wreck" and is frequently seen to be a brittleness inducer in code. On the other hand, I encourage the heck out of it in libraries I write, from jDBI for example:

handle.prepareBatch("insert into something (id, name) values (:id, :name)")
        .add(1, "Brian")
        .add(2, "Keith")
        .add(3, "Eric")
        .execute();

On yet another hand, I pointed out that it was a bad practice to someone in a code review just last week. So, when is it a good fluent interface, and when is it a train wreck? Good question. My first reaction is "I know it when I see it" but that isn't very useful. So, to take a stab at a description...

Method chaining makes a good interface when the chained methods all come from the same module, are part of the published API, and when taken together represent a single logical action. In the first example, they are all on the published interface of FEST-Assert and are asserting that yoda is correct. In the second, they all come from the published interfaces of jDBI and form one batch statement.

For a negative example, let's take data access traversal:

yoda.getMidiclorians().getForce().getDarkSide().getLightningFromFingers();

Here, even if the interfaces for all the intervening classes are in the same module, and are very stable, it sure as heck isn't a single logical unit.

Anyway, gotta run, lunch is done. If I think of a better way to describe it will do so this evening!

5 writebacks [/src] permanent link

Sun, 24 Feb 2008

Learning, Programming, Etc.

A happy coincidence of The Praggies and O'Reilly both doing bookamajigs focused on general, programmery, learning. O'Reilly's is an interesting take in that it is a collaborative, wiki-based venture. Andy Hunt's is triply interesting to me as I did my graduate work on the stuff he is writing about, if in a very different context (formal education).

Refactoring Your Wetware starts out with a nice review of the Dreyfus model (I grabbed the beta book) but is still mostly not-yet-written, so Andy's approach to progressing through the stages isn't clear, yet. I'm very much looking forward to seeing how he approaches and presents the long view of learning.

The O'Reilly approach hits close to home for me as I spent a lot of time experimenting with material from the Portland Pattern Repository when I transitioned back into programmering from teaching and realized I didn't actually remember much! Anything that helps self-taught folks get better is teh win.

0 writebacks [/src] permanent link

Sat, 02 Feb 2008

AtomSub

So AtomPub is a reasonable way to publish things, etc. Would be nice to push an AtomPub endpoint to a service as a callback for events. An awfully large number of things can accept HTTP now, and there is a reasonable basic-operation system available, so why not take advantage for callback APIs? Instead of polling a site for updates, post a subscription with an AtomPub endpoint as the entry and let the service push to you. AtomSub :-)

3 writebacks [/src] permanent link

Sat, 26 Jan 2008

an interesting milestone: mod_slow

Crossed some kind of threshold today, I am sure. I needed a quick'n'dirty web server hack so broke out C for an apache module! What is happening to me?!

Basically, I needed something to put behind a proxy to do some load and capacity testing of the proxy. As I wanted to have things like the size of the response and time of the response be easily configurable on the load generator I needed to hack something up...

#include "httpd.h"
#include "http_config.h"
#include "http_protocol.h"
#include "ap_config.h"
#include "apr_time.h"
#include "apr_strings.h"

static int handler(request_rec *r)
{
    if (r->args)
        apr_sleep(apr_atoi64(r->args) * 1000);
    return DECLINED;
}

static void register_hooks(apr_pool_t *p)
{
    ap_hook_handler(handler, NULL, NULL, APR_HOOK_MIDDLE);
}

module AP_MODULE_DECLARE_DATA slow_module = {
    STANDARD20_MODULE_STUFF, 
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    register_hooks
};

This very nicely lets me drop artificial slowdowns in front of the the default handler (serve up files) so I can control "processing time" and file size (pick the file with the size I want): http://binky/big.html?2000 Sweet! Am kind of floored that the first solution which leapt to mind for me was an apache module in C, though!

For some reason, putting the sleep in fixups doubled the sleep time, so I made it a declined handler and things worked fine. Need to figure out why.... someday.

2 writebacks [/src/apache] permanent link

Mon, 21 Jan 2008

IO Heresy

A recent thread on the Apache HTTPD development mailing list reminded me of something funny. Orthodox Server Programmerism states that events are better than threads. Funny thing is that at the same time this meme has finally broken into the mainstream (the last several years) it has become largely irrelevant. Even better, it is only going to become more and more irrelevant as time passes.

OS kernels, even ones that love events and hate threads now do threads very efficiently. On top of that, many-processor, multi-core computers are the norm (heck, my laptop is dual-core) and this trend is going to increase very quickly.

I suspect events vs. threads is going to go the way of Assembly vs. C (or C vs. Java (or Java vs. The P(+R) gang)). Sure you can theoretically optimize the former better, but the theoretically will be operative in the vast majority of cases. Heck, I hope thread schedulers use events when a thread blocks on IO. Presuming they do, the penalty for a thread vs an event listener should be the thread stack and restoring registers. The thread stack is a specific solution to storing the context for handling the event, and I don't know close to enough to dive any deeper :-)

Heck, anyone designing a new system for embarassingly concurrent stuff today would probably be better off solving this in the compiler and then exposing linear programming to the programmer via explicit happens-before semantics rather than a thread model.

2 writebacks [/src] permanent link

Thu, 17 Jan 2008

Inter-graph-amajig

Nick and I had a good IM discussion last night about the social graph, data portability, etc. Timely in light of

0 writebacks [/tech] permanent link

Mon, 14 Jan 2008

My Current Pen Hack

It came up in a different context, so thought I would post :-) If you trim a tiny smidge from the butt end of a uni-ball Signo 207 refill it fits very nicely into the Pilot G2 Pro body. The G2 Pro body is my favorite everyday pen body, but the G2 cartridges (what do you call them when they include the ball, anyway?) tend to be uneven and even blotchy on me compared to the 207s. So, a couple seconds with a pocketknife and I have the best of both worlds!

1 writebacks [/stuff] permanent link

Nu is Sweet!

Ran across Nu today in a reference from Brandon Warner. Nu is an interpreted lisp dialect with close ties to ruby and objective-c. Really :-)

The best way to illustrate this is probably to look at a snippet, in this case from the nuke tool bundled with Nu:

(unless @prefix
        (set @prefix 
             "#{((((NSProcessInfo processInfo) arguments) 0) dirName)}.."))

(unless @icon_files 
        (set @icon_files 
             (array "#{@prefix}/share/nu/resources/nu.icns")))

I won't point out the objc and ruby bits therein, as if you know one or both, you see them. It looks weird in places, but if you want to hack around on cocoa stuff, wowzers, it rocks. Check the converted form of ye olde currency converter (the first bit of cocoa programming for a lot of folks, myself included).

0 writebacks [/src] permanent link

Thu, 03 Jan 2008

Goal: Learn Emacs in 2008

Right now I am functional enough in emacs to do basic things, like write this blog post, but I use too many OS X-isms, like splat-z for undo, and I need to turn those off and learn emacs properly :-)

6 writebacks [/stuff] permanent link