Using Virtual Nodes to Compact Vector Clocks
One hiccup encountered when using vector clocks is that there is no inherent way of reducing the size of the clock. Basically, any node which acts as a causal agent of change has the potential to be forever recorded in the clock. This leads to unbounded clock size, with time. Most systems tend to have a limited number of causal, or lead, nodes providing clock values so it is avoided, but sometimes you don't have that.
When vector clocks are used to track causality in a storage system, such as in Amazon's Dynamo system, it becomes possible to create syncronization points in the history of the element, between storage nodes, if the storage nodes are able to form consensus between themselves on the value of an element at a specific point in the elements history. If we are talking an eventually consistent system, this can be done by using a background syncronization and merge algorithm which merges acausal changes in the background. Alternately, it could be client resolved, in systems like Dynamo, but that isn't my problem, so... I digress.
When the system believes it has a value at a given clock value, where the clock is causally related to the unified value on the other storage nodes holding the element, it can try to achieve concensus about this, and if successful, increment an artifical clock key which we'll call the epoch. If successful, the epoch value subsumes the vector clock values associated with the epoch in the element, shrinking the element's clock.
To run through an example, let's say we have a system which uses three storage nodes for each element. We don't care exactly how these elements values are assigned, except to recognize that it allows for non-causally related changes to occur. At a given point in time the storage nodes may have values for an element A, as follows:
| Node | Clock |
|---|---|
| red | [red:2, green:1] |
| blue | [red:2, green:1, blue:2] |
| green | [red:3, green:1, blue:1] |
A Paxos instance may be executed proposing that epoch:1 be [red:2, green:1]. As each node can agree that [red:2, green:1] comes before its value, it can accept the epoch value. Upon acceptance of the value, the clocks would become:
| Node | Clock |
|---|---|
| red | [epoch:1] |
| blue | [epoch:1, blue:2] |
| green | [epoch:1, red:3, blue:1] |
Assuming a background reconciliation protocol, a system could apply an appropriate heuristic to decide when to atempt to increment the epoch. A good example of such would be after unrelated values have been successfully merged. When it makes sense, and how to back-off to older clock values really depends on the characteristics of the system being designed and how it will be used.
As pointed out in the Dynamo paper, systems where there tend to be a small number of keys in the clock don't generally have this problem, Dynamo avoids it by causing the clock keys to be based on a small number of likely coordinator nodes:
To this end, Dynamo employs the following clock truncation scheme: Along with each (node, counter) pair, Dynamo stores a timestamp that indicates the last time the node updated the data item. When the number of (node, counter) pairs in the vector clock reaches a threshold (say 10), the oldest pair is removed from the clock. Clearly, this truncation scheme can lead to inefficiencies in reconciliation as the descendant relationships cannot be derived accurately. However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated.
In something like that, it may not make a lot of sense -- the problem just doesn't tend to come up. On the other hand, other systems, such as one which uses a user id, or session id, as a clock key would tend to generate larger clocks. This kind of keying can be useful for providing read-what-I-wrote consistency, but that is another discussion :-)
0 writebacks [/src] permanent link
Mon, 23 Jun 2008I am a big fan of the APR versioning guidelines, but there is an element they don't capture well somewhere between major (backwards incompat change) and minor (forward incompat change) in Java. If you follow the, generally recommended practice of exposing things via interfaces (pure virtual classes), you have opened the door for users to implement those interfaces.
In a C-style world, adding a function to a library would bump you from 1.6 to 1.7, using APR guidelines. In an interface-driven Java-style world, adding a method to an interface would bump you from 1.6 to 2.0. Or would it?
To take a concrete example, a coworker (thanks Jax!) recently
re-added first class support for callable statements to jDBI. jDBI
uses a Handle interface to expose operations against a
database. It has gained a method:
public <ReturnType> Call<ReturnType> createCall(String callableSql,
CallableStatementMapper<ReturnType> mapper);
If you implement this interface, the change is backwards
incompatible. An implementation of Handle made against
2.2.2 will not compile against this. On the other hand, the intent
of the library is not for people to implement Handle,
it is to expose the libraries functionality. It is almost a
header file.
So, 2.3 or 3.0?
3 writebacks [/src/java] permanent link
Sat, 14 Jun 2008I am quite liking yasnippet so far. I have converted my bloggie stuff over from textmate to it, and it works darned nicely. Snippet definition is easy, and clear. Woot!
0 writebacks [/emacs] permanent link
Thu, 15 May 2008
Topology Aware Consistency Policies
I am increasingly fascinated by consistency options, in a distributed
storage system, made available by topology awareness on the
client. For example, if you consider a write committed iff the
write has been made to a majority of all storage nodes and a
majority of the local nodes, where local would typically be "same
datacenter," it allows you to achieve repeatable read
read what you wrote
consistency locally when a majority of local nodes have responded
to a read request with a matching response, while still providing
overall consistency across the entire system.
0 writebacks [/src] permanent link
Sat, 10 May 2008
Long Tail Treasure Trove Slides!
Gianugo has posted the slides from our JavaOne presentation, on Slideshare and in pdf form. The talk was awesome to give, we had a killer audience. A huge thank you to all who attended!
2 writebacks [/src/java] permanent link
Wed, 23 Apr 2008
The Shape of Async Callback APIs
When we have async callbacks in a Java API, the idiommatic way of writing the interface to register the callback looks like:
Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
public Foo onEvent(Event e) {
return new Foo(e.getSomethingNifty());
}
})
I'd like to propose that we adopt a new idiom, which is to pass an
Executor along with the listener:
Executor myExecutor = Executors.newSingleThreadExecutor();
// ...
Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
public Foo onEvent(Event e) {
return new Foo(e.getSomethingNifty());
}
}, myExecutor);
The main benefit is that you give the caller control over the threading model for the callback. Right now, most libraries either have a separate thread pool for callbacks, or make the callback on the event generator thread. Usually there is nothing but an obscure reference on a wiki to indicate the behavior.
2 writebacks [/src/java] permanent link
Thu, 17 Apr 2008As I got hit with a meme about command line stuff, I figured I'd share an update to my favorite bash completion:
SSH_COMPLETE=( $(cut -f1 -d' ' ~/.ssh/known_hosts |\
tr ',' '\n' |\
sort -u |\
grep -e '[:alpha:]') )
complete -o default -W "${SSH_COMPLETE[*]}" ssh
If you ssh directly to IP addresses very often, you might want to leave
off the last grep -e.
Not going to tag anyone, but if you have a favorite completion, please share! (I suggest not in a comment on this post as my comment system does not preserve any formatting).
5 writebacks [/stuff] permanent link
bakert tagged me, so:
brianm@binky:~$ history | awk {'print $2'} | sort | uniq -c | sort -k1 -rn | head
164 svn
52 cd
42 ssh
32 sudo
22 git
16 ls
16 for
14 echo
13 man
10 curl
brianm@binky:~$
Sadly, I only seem to keep a 500 line .history -- need to fix that.
0 writebacks [/stuff] permanent link
Tue, 15 Apr 2008So, thought experiment. If we, as an industry, had to drop Java, the language and the virtual machine, for some reason, what could really move into its niche?
Some points to consider:
Putting aside the "damn I want to use coolness X," what out there provides something that could do it?
4 writebacks [/src] permanent link
Sun, 16 Mar 2008Nathan wrote up a great blog post about thoughts for working on mod_wombat (Lua in Apache) for this coming Google Summer of Code. I'd be extremely excited (along with Nathan and Matthew, I suspect) to mentor someone on it if it sounds exciting to folks out there :-)
0 writebacks [/src/wombat] permanent link
Sat, 01 Mar 2008I haven't actually used Groovy much since, oh, egads, umh, 2004. It was at 1.0b6 and was taking a direction which I both disagreed with and found kind of boring. It was throwing away large chunks of dynamicity for some performance gains as it decided it really wanted to be compiled, after all. Large chunks of the new syntax I also disagreed with so... I wandered away, wishing everyone luck.
Well, funny things can happen in, er, three and a half years, so when a coworker suggested we look at Groovy for solving a problem my initial reaction was "erk, umh, I kind of like our use of JRuby for that" but Groovy wasn't even at 1.0 when last I used it, so it was a pretty unfair reaction. Looking again, it is at 1.5.4! Time to revisit!
After noting that none of my old code parsed, I worked through the tutorial. This isn't the same language I last used. It smells like Perl or PHP more than Ruby, which it rather resembled back then. Overall, my second "first" impression: totally practical, rather ornery. Will dig into it more.
0 writebacks [/src/groovy] permanent link
Wed, 27 Feb 2008A coworker commented to me today "what's up with all these libraries that encourage method chaining? ;-)" when we were talking about FEST. To stay in context, we are talking about this kind of thing:
assertThat(yoda).isInstanceOf(Jedi.class)
.isEqualTo(foundJedi)
.isNotEqualTo(foundSith);
This, of course, has also been called nice things like "train wreck" and is frequently seen to be a brittleness inducer in code. On the other hand, I encourage the heck out of it in libraries I write, from jDBI for example:
handle.prepareBatch("insert into something (id, name) values (:id, :name)")
.add(1, "Brian")
.add(2, "Keith")
.add(3, "Eric")
.execute();
On yet another hand, I pointed out that it was a bad practice to someone in a code review just last week. So, when is it a good fluent interface, and when is it a train wreck? Good question. My first reaction is "I know it when I see it" but that isn't very useful. So, to take a stab at a description...
Method chaining makes a good interface when the chained methods all
come from the same module, are part of the published API, and when taken
together represent a single logical action. In the first example, they are
all on the published interface of FEST-Assert and are asserting that
yoda is correct. In the second, they all come from the
published interfaces of jDBI and form one batch statement.
For a negative example, let's take data access traversal:
yoda.getMidiclorians().getForce().getDarkSide().getLightningFromFingers();
Here, even if the interfaces for all the intervening classes are in the same module, and are very stable, it sure as heck isn't a single logical unit.
Anyway, gotta run, lunch is done. If I think of a better way to describe it will do so this evening!
5 writebacks [/src] permanent link
Sun, 24 Feb 2008A happy coincidence of The Praggies and O'Reilly both doing bookamajigs focused on general, programmery, learning. O'Reilly's is an interesting take in that it is a collaborative, wiki-based venture. Andy Hunt's is triply interesting to me as I did my graduate work on the stuff he is writing about, if in a very different context (formal education).
Refactoring Your Wetware starts out with a nice review of the Dreyfus model (I grabbed the beta book) but is still mostly not-yet-written, so Andy's approach to progressing through the stages isn't clear, yet. I'm very much looking forward to seeing how he approaches and presents the long view of learning.
The O'Reilly approach hits close to home for me as I spent a lot of time experimenting with material from the Portland Pattern Repository when I transitioned back into programmering from teaching and realized I didn't actually remember much! Anything that helps self-taught folks get better is teh win.
0 writebacks [/src] permanent link
Sat, 02 Feb 2008So AtomPub is a reasonable way to publish things, etc. Would be nice to push an AtomPub endpoint to a service as a callback for events. An awfully large number of things can accept HTTP now, and there is a reasonable basic-operation system available, so why not take advantage for callback APIs? Instead of polling a site for updates, post a subscription with an AtomPub endpoint as the entry and let the service push to you. AtomSub :-)
3 writebacks [/src] permanent link
Sat, 26 Jan 2008
an interesting milestone: mod_slow
Crossed some kind of threshold today, I am sure. I needed a quick'n'dirty web server hack so broke out C for an apache module! What is happening to me?!
Basically, I needed something to put behind a proxy to do some load and capacity testing of the proxy. As I wanted to have things like the size of the response and time of the response be easily configurable on the load generator I needed to hack something up...
#include "httpd.h"
#include "http_config.h"
#include "http_protocol.h"
#include "ap_config.h"
#include "apr_time.h"
#include "apr_strings.h"
static int handler(request_rec *r)
{
if (r->args)
apr_sleep(apr_atoi64(r->args) * 1000);
return DECLINED;
}
static void register_hooks(apr_pool_t *p)
{
ap_hook_handler(handler, NULL, NULL, APR_HOOK_MIDDLE);
}
module AP_MODULE_DECLARE_DATA slow_module = {
STANDARD20_MODULE_STUFF,
NULL,
NULL,
NULL,
NULL,
NULL,
register_hooks
};
This very nicely lets me drop artificial slowdowns in front of the the
default handler (serve up files) so I can control "processing time" and
file size (pick the file with the size I want):
http://binky/big.html?2000
Sweet! Am kind of floored that the first solution which leapt to mind for
me was an apache module in C, though!
For some reason, putting the sleep in fixups doubled the
sleep time, so I made it a declined handler and things worked fine. Need
to figure out why.... someday.
2 writebacks [/src/apache] permanent link
Mon, 21 Jan 2008
A recent thread on the Apache HTTPD development mailing list reminded me of something funny. Orthodox Server Programmerism states that events are better than threads. Funny thing is that at the same time this meme has finally broken into the mainstream (the last several years) it has become largely irrelevant. Even better, it is only going to become more and more irrelevant as time passes.
OS kernels, even ones that love events and hate threads now do threads very efficiently. On top of that, many-processor, multi-core computers are the norm (heck, my laptop is dual-core) and this trend is going to increase very quickly.
I suspect events vs. threads is going to go the way of Assembly vs. C (or C vs. Java (or Java vs. The P(+R) gang)). Sure you can theoretically optimize the former better, but the theoretically will be operative in the vast majority of cases. Heck, I hope thread schedulers use events when a thread blocks on IO. Presuming they do, the penalty for a thread vs an event listener should be the thread stack and restoring registers. The thread stack is a specific solution to storing the context for handling the event, and I don't know close to enough to dive any deeper :-)
Heck, anyone designing a new system for embarassingly concurrent stuff today would probably be better off solving this in the compiler and then exposing linear programming to the programmer via explicit happens-before semantics rather than a thread model.
2 writebacks [/src] permanent link
Thu, 17 Jan 2008Nick and I had a good IM discussion last night about the social graph, data portability, etc. Timely in light of
0 writebacks [/tech] permanent link
Mon, 14 Jan 2008It came up in a different context, so thought I would post :-) If you trim a tiny smidge from the butt end of a uni-ball Signo 207 refill it fits very nicely into the Pilot G2 Pro body. The G2 Pro body is my favorite everyday pen body, but the G2 cartridges (what do you call them when they include the ball, anyway?) tend to be uneven and even blotchy on me compared to the 207s. So, a couple seconds with a pocketknife and I have the best of both worlds!
1 writebacks [/stuff] permanent link
Ran across Nu today in a reference from Brandon Warner. Nu is an interpreted lisp dialect with close ties to ruby and objective-c. Really :-)
The best way to illustrate this is probably to look at a snippet, in this case from the nuke tool bundled with Nu:
(unless @prefix
(set @prefix
"#{((((NSProcessInfo processInfo) arguments) 0) dirName)}.."))
(unless @icon_files
(set @icon_files
(array "#{@prefix}/share/nu/resources/nu.icns")))
I won't point out the objc and ruby bits therein, as if you know one or both, you see them. It looks weird in places, but if you want to hack around on cocoa stuff, wowzers, it rocks. Check the converted form of ye olde currency converter (the first bit of cocoa programming for a lot of folks, myself included).
0 writebacks [/src] permanent link
Thu, 03 Jan 2008Right now I am functional enough in emacs to do basic things, like write this blog post, but I use too many OS X-isms, like splat-z for undo, and I need to turn those off and learn emacs properly :-)