Brian McCallister

Sun, 15 Mar 2009

By lucky happenstance, the interfaces and idioms of jDBI work very nicely with JRuby's coercion from Ruby to Java.

require 'jruby'
require 'derby-10.4.2.0.jar'
require 'jdbi-2.2.2.jar'

dbi = org.skife.jdbi.v2.DBI.new("jdbc:derby:/tmp/woof;create=true")

dbi.withHandle do |h|
  h.createQuery("select name from woof").each do |rs|
    puts rs['name']
  end
end

It isn't perfect, I'd hoped the short form (rather than fluent form) of the handle interface would work like

h.execute "insert into woof (name) values (:name)", ["brian"]

but alas, it does not. JRuby isn't coercing the ruby array into a Java object array to match the function signature (String, Object[]). Still, they do play awfully nicely together!

2 writebacks [/src/ruby] permanent link

Fri, 27 Feb 2009

Proper Fib

I am tired of seeing really inefficient fibonacci sequence functions all over the place. I fear that someone might, someday, use one in a setting where it matters, therefore let me set the record straight with some proper fib examples!

Ruby

module Math
  PHI = (1 + Math.sqrt(5)) / 2
end
 
def fib n
  (((Math::PHI ** n) - ((1 - Math::PHI) ** n) ) / Math.sqrt(5)).to_i
end

Haskell

let fib n = 
        let phi = (1 + sqrt 5) / 2 in 
        round ((phi^n - (1 - phi)^n) / (sqrt 5))

Now I can go to bed :-)

6 writebacks [/src] permanent link

Wed, 28 Jan 2009

The New Fork, Visualized

This video by Ilya Grigorik nicely demonstrates the assertion that lowering a barrier to contribribution. You can see where they switched to git pretty clearly.

1 writebacks [/src] permanent link

Sat, 24 Jan 2009

The New Fork

David and Chris's discussion ( 1, 2, 3, 4, and 5) highlights a major shift going on in open source, right now. The shift: Forking is Good.

Historically, a fork has been seen as a sign of trouble in a project, and folks have gone to huge and painful means to prevent forks. Heck, not long ago the blogging rage was about which licenses helped prevent forks, BSD-style or GPL-style. Things are changing. It isn't really that new, though...

Aside from the big-obvious (Linux, which has encouraged forking, well, forever (in internet time)), even staid centralists like Apache talked about it as a good and natural thing. In that context it wasn't supposed to be a fork, though, it was supposed to be "a new version", and it stayed in the same source tree and was one big happy project.

At ApacheCon, in New Orleans, Dirk-Willem asked the key question, when discussing git, subversion, etc and the ASF -- "are we shaped by the tools, or do the tools shape us?" Many leapt to say that we shaped the tools, of course. It is, of course ;-), a blend. The tools shape the mindset which shapes us who shape the tools.

Forking is painful because merging changes between forks is painful. It is seen as a huge duplication of effort, a squandering of developer time, and a "I'm taking my ball and leaving, thbbft!" For some high profile things, such as X.org, this has been kind of true. Many times it can also e irreconilable goals (such as Dragonfly). Many times, however, it is just "I need something, but a little different, and the best path for me right now is to take something and incompatibly change it" (nginx, for example).

Git (and others, but whatever, I am talking about git, and hwile others support many-repos, only git really expects and encourages many repos) removes much the effort part from the forking equation. Github, in particular, as Chris makes clear, allows for very easy moving of code from fork to fork. Linus, a git I have never met, wrote software to support how Linux devlopment works (ie, with Linus as the SCM) automating what he did, and along the way opened the tooling door for everyone else.

Personally, I love the "please fork my stuff" mindset. I write open source stuff to solve my problem, and hey, if it solves yours too, bonus, have fun, don't sue me.. The bonuses add up, which is why I bother to do it in the open -- people do take it and run. I get patches, good feature requests, and whole ports of code I wrote and actively disowned a couple times a week. I always hate telling folks "I no longer maintain that, some guy named Chris, or maybe Anthony, took over, I think -- he hasn't made a release yet, though, so not sure what is up." If it is a bugfix someone sent me, I have done the spelunking to apply the fix and cut a release on code I expect to never use again (I take bugs personally), but aside from that it is "er, yeah, svn is here I think, let me know if you want commit."

There are also projects I use every day and are in a stable state (jdbi for example). It works great, I know other folks use it, but frankly, it is just a utility library. If you need changes, don't wait on me! Git, and something like github to make it even easier, means I can say "here is the code, fork it, go to town." A month later when I have spare time I can pull changes back, or if the forker really takes the ball and runs with it, I can just start using that code.

Where I think it doesn't work for people is when they have strong financial or emotional attachement to their code. If you make your living consulting on OpenWombat and uppity twenty year olds go and fork it all over, your job just got tougher. Worse, one of these forks may become more popular than OpenWombat and you won't now own it.

My personal opinion is to let the best code win, and the best code is a moving target. Chris's example, in the conversation with David, assumes the most active code line is the best code. Sometimes this is true, but more often (in my opinion) the more stable (not abandoned) code line is probably better. This is kind of Linux vs FreeBSD (and Linux clearly has won the popular vote) but... FreeBSD is solid as a brick in a way I doubt Linux will be for a long time yet, and stabilty trumps features for a lot of things. In a free-fork world, this difference of opinion is easily resolved, and the best features of the unstables can flow into the stables much more easily.

I am fascinated to see the shape of how this evolves. There are huge social and legal potholes ahead on the new fork, but in the end, as we have all proven, the best code does tend to win despite the efforts of trolls and zombies. Lowering the barriers to contributing and experimenting leads, more or less directly, to more development and experimentation, so I expect that the best code will tend to emerge from the network of forks which make sharing changes between them as easy as possible.

1 writebacks [/src] permanent link

Tue, 06 Jan 2009

Tokyo Tyrant is Awesome

If you are a hacker building a distributed system, drop what you are doing and go play with Tokyo Tyrant. I haven't been this excited about something since I first played with rails

I am serious, stop reading, start compiling.

ps: The tokyocabinet in macports is antique, build your own. Tokyo Tyrant needs lua to be installed in /usr/local, annoying, but survivable until patched to have a --with-lua.

1 writebacks [/src] permanent link

Mon, 15 Dec 2008

Real World Haskell, for Jon

A while back I told Jon that if he wrote something useful with Haskell I'd learn it. He proceeded to do something usful with it, so I have started working my way through the (excellent, so far) Real World Haskell.

So far, I like the language, but I haven't done anything useful with it.

2 writebacks [/src/haskell] permanent link

Mon, 08 Dec 2008

Why Apache?

People frequently ask why a project would want to move to Apache. The most recent case of this I have run across was in a thread on CouchDB's graduation, on Reddit. To answer the question, I'll take my apache hat off for a moment and put by consumer internet company architect (who codes as much as I can make the time for!) hat on.

The biggest benefit, to me, is that Apache provides a known way of doing things. We (remember, I have my $company_man hat on) don't use Couch right now (though we have certainly talked about it), but a major factor if we choose to use it is how the code gets developed, and how we can influence that in the direction that we need. With Apache, we know how it gets developed (in the open, all decisions on public mailing lists) and how to get involved (submit bugs and patches which will be discussed on the mailing list, if we need more keep submitting patches until they get tired of applying them and make me a committer, as committer keep working in the open, etc).

If you need to have influence over the project, say because you are creating a strategic dependency on it, you absolutely know that you can gain as much influence over an apache project as your competence allows. This is crucial, as the alternative is the willingness to maintain a fork if the developers go berserk, wander away, which happens. A major part of technology selection is balancing risks. It is not being totally risk averse, but it is being aware of the risks in critical dependencies and making the choice to accept the price if that risk converts into a liability. Having a guaranteed way to provide continuity to a project in the face of typical project killers, such as the project leader leaving the project, trumps merely having freedom to fork.

1 writebacks [/src] permanent link

Sat, 23 Aug 2008

Excellence

I was reminded recently that excellence lies in executing the basics perfectly, every time, much more than in executing the advanced adequately.

2 writebacks [/src] permanent link

Wed, 30 Jul 2008

Using Virtual Nodes to Compact Vector Clocks

One hiccup encountered when using vector clocks is that there is no inherent way of reducing the size of the clock. Basically, any node which acts as a causal agent of change has the potential to be forever recorded in the clock. This leads to unbounded clock size, with time. Most systems tend to have a limited number of causal, or lead, nodes providing clock values so it is avoided, but sometimes you don't have that.

When vector clocks are used to track causality in a storage system, such as in Amazon's Dynamo system, it becomes possible to create syncronization points in the history of the element, between storage nodes, if the storage nodes are able to form consensus between themselves on the value of an element at a specific point in the elements history. If we are talking an eventually consistent system, this can be done by using a background syncronization and merge algorithm which merges acausal changes in the background. Alternately, it could be client resolved, in systems like Dynamo, but that isn't my problem, so... I digress.

When the system believes it has a value at a given clock value, where the clock is causally related to the unified value on the other storage nodes holding the element, it can try to achieve concensus about this, and if successful, increment an artifical clock key which we'll call the epoch. If successful, the epoch value subsumes the vector clock values associated with the epoch in the element, shrinking the element's clock.

To run through an example, let's say we have a system which uses three storage nodes for each element. We don't care exactly how these elements values are assigned, except to recognize that it allows for non-causally related changes to occur. At a given point in time the storage nodes may have values for an element A, as follows:

Node	Clock
red	[red:2, green:1]
blue	[red:2, green:1, blue:2]
green	[red:3, green:1, blue:1]

A Paxos instance may be executed proposing that epoch:1 be [red:2, green:1]. As each node can agree that [red:2, green:1] comes before its value, it can accept the epoch value. Upon acceptance of the value, the clocks would become:

Node	Clock
red	[epoch:1]
blue	[epoch:1, blue:2]
green	[epoch:1, red:3, blue:1]

Assuming a background reconciliation protocol, a system could apply an appropriate heuristic to decide when to atempt to increment the epoch. A good example of such would be after unrelated values have been successfully merged. When it makes sense, and how to back-off to older clock values really depends on the characteristics of the system being designed and how it will be used.

As pointed out in the Dynamo paper, systems where there tend to be a small number of keys in the clock don't generally have this problem, Dynamo avoids it by causing the clock keys to be based on a small number of likely coordinator nodes:

To this end, Dynamo employs the following clock truncation scheme: Along with each (node, counter) pair, Dynamo stores a timestamp that indicates the last time the node updated the data item. When the number of (node, counter) pairs in the vector clock reaches a threshold (say 10), the oldest pair is removed from the clock. Clearly, this truncation scheme can lead to inefficiencies in reconciliation as the descendant relationships cannot be derived accurately. However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated.

In something like that, it may not make a lot of sense -- the problem just doesn't tend to come up. On the other hand, other systems, such as one which uses a user id, or session id, as a clock key would tend to generate larger clocks. This kind of keying can be useful for providing read-what-I-wrote consistency, but that is another discussion :-)

0 writebacks [/src] permanent link

Mon, 23 Jun 2008

Library Versioning, Redux

I am a big fan of the APR versioning guidelines, but there is an element they don't capture well somewhere between major (backwards incompat change) and minor (forward incompat change) in Java. If you follow the, generally recommended practice of exposing things via interfaces (pure virtual classes), you have opened the door for users to implement those interfaces.

In a C-style world, adding a function to a library would bump you from 1.6 to 1.7, using APR guidelines. In an interface-driven Java-style world, adding a method to an interface would bump you from 1.6 to 2.0. Or would it?

To take a concrete example, a coworker (thanks Jax!) recently re-added first class support for callable statements to jDBI. jDBI uses a Handle interface to expose operations against a database. It has gained a method:

public <ReturnType> Call<ReturnType> createCall(String callableSql, 
                                     CallableStatementMapper<ReturnType> mapper);

If you implement this interface, the change is backwards incompatible. An implementation of Handle made against 2.2.2 will not compile against this. On the other hand, the intent of the library is not for people to implement Handle, it is to expose the libraries functionality. It is almost a header file.

So, 2.3 or 3.0?

3 writebacks [/src/java] permanent link

Thu, 15 May 2008

Topology Aware Consistency Policies

I am increasingly fascinated by consistency options, in a distributed storage system, made available by topology awareness on the client. For example, if you consider a write committed iff the write has been made to a majority of all storage nodes and a majority of the local nodes, where local would typically be "same datacenter," it allows you to achieve ~~repeatable read~~ read what you wrote consistency locally when a majority of local nodes have responded to a read request with a matching response, while still providing overall consistency across the entire system.

0 writebacks [/src] permanent link

Sat, 10 May 2008

Long Tail Treasure Trove Slides!

Gianugo has posted the slides from our JavaOne presentation, on Slideshare and in pdf form. The talk was awesome to give, we had a killer audience. A huge thank you to all who attended!

2 writebacks [/src/java] permanent link

Wed, 23 Apr 2008

The Shape of Async Callback APIs

When we have async callbacks in a Java API, the idiommatic way of writing the interface to register the callback looks like:

Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
})

I'd like to propose that we adopt a new idiom, which is to pass an Executor along with the listener:

Executor myExecutor = Executors.newSingleThreadExecutor();
// ...
Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
}, myExecutor);

The main benefit is that you give the caller control over the threading model for the callback. Right now, most libraries either have a separate thread pool for callbacks, or make the callback on the event generator thread. Usually there is nothing but an obscure reference on a wiki to indicate the behavior.

2 writebacks [/src/java] permanent link

Tue, 15 Apr 2008

If We Had to Drop Java

So, thought experiment. If we, as an industry, had to drop Java, the language and the virtual machine, for some reason, what could really move into its niche?

Some points to consider:

We'd want to be able to compile Java the Language to it in order to not have to rewrite everything.
If we are compiling Java the language to it, it should support Java semantics, or at least something close enough that you can get around it in an automatic manner.
We are addicted to GC now. Smalltalkers and Lispers, you were right.
We are primarily talking about a medium to big server footprint.
It needs to allow for efficient use of lots of processors.
It needs to support large teams of mediocre programmers.
It would be nice if it supported small teams of kickass programmers.
It should be legally "safe" -- no one should be able to yank the carpet out from under it via IP evildoing. The FSF nailed this one.
The existing culture would need to be able to cope with corporate adoption, or the technology would have to be easy enough to fork and use that the existing culture could be ignored.

Putting aside the "damn I want to use coolness X," what out there provides something that could do it?

4 writebacks [/src] permanent link

Sun, 16 Mar 2008

mod_wombat and the GSoC

Nathan wrote up a great blog post about thoughts for working on mod_wombat (Lua in Apache) for this coming Google Summer of Code. I'd be extremely excited (along with Nathan and Matthew, I suspect) to mentor someone on it if it sounds exciting to folks out there :-)

0 writebacks [/src/wombat] permanent link

Sat, 01 Mar 2008

Revisiting Groovy

I haven't actually used Groovy much since, oh, egads, umh, 2004. It was at 1.0b6 and was taking a direction which I both disagreed with and found kind of boring. It was throwing away large chunks of dynamicity for some performance gains as it decided it really wanted to be compiled, after all. Large chunks of the new syntax I also disagreed with so... I wandered away, wishing everyone luck.

Well, funny things can happen in, er, three and a half years, so when a coworker suggested we look at Groovy for solving a problem my initial reaction was "erk, umh, I kind of like our use of JRuby for that" but Groovy wasn't even at 1.0 when last I used it, so it was a pretty unfair reaction. Looking again, it is at 1.5.4! Time to revisit!

After noting that none of my old code parsed, I worked through the tutorial. This isn't the same language I last used. It smells like Perl or PHP more than Ruby, which it rather resembled back then. Overall, my second "first" impression: totally practical, rather ornery. Will dig into it more.

0 writebacks [/src/groovy] permanent link

Wed, 27 Feb 2008

Method Chaining

A coworker commented to me today "what's up with all these libraries that encourage method chaining? ;-)" when we were talking about FEST. To stay in context, we are talking about this kind of thing:

assertThat(yoda).isInstanceOf(Jedi.class)
                .isEqualTo(foundJedi)
                .isNotEqualTo(foundSith);

This, of course, has also been called nice things like "train wreck" and is frequently seen to be a brittleness inducer in code. On the other hand, I encourage the heck out of it in libraries I write, from jDBI for example:

handle.prepareBatch("insert into something (id, name) values (:id, :name)")
        .add(1, "Brian")
        .add(2, "Keith")
        .add(3, "Eric")
        .execute();

On yet another hand, I pointed out that it was a bad practice to someone in a code review just last week. So, when is it a good fluent interface, and when is it a train wreck? Good question. My first reaction is "I know it when I see it" but that isn't very useful. So, to take a stab at a description...

Method chaining makes a good interface when the chained methods all come from the same module, are part of the published API, and when taken together represent a single logical action. In the first example, they are all on the published interface of FEST-Assert and are asserting that yoda is correct. In the second, they all come from the published interfaces of jDBI and form one batch statement.

For a negative example, let's take data access traversal:

yoda.getMidiclorians().getForce().getDarkSide().getLightningFromFingers();

Here, even if the interfaces for all the intervening classes are in the same module, and are very stable, it sure as heck isn't a single logical unit.

Anyway, gotta run, lunch is done. If I think of a better way to describe it will do so this evening!

6 writebacks [/src] permanent link

Sun, 24 Feb 2008

Learning, Programming, Etc.

A happy coincidence of The Praggies and O'Reilly both doing bookamajigs focused on general, programmery, learning. O'Reilly's is an interesting take in that it is a collaborative, wiki-based venture. Andy Hunt's is triply interesting to me as I did my graduate work on the stuff he is writing about, if in a very different context (formal education).

Refactoring Your Wetware starts out with a nice review of the Dreyfus model (I grabbed the beta book) but is still mostly not-yet-written, so Andy's approach to progressing through the stages isn't clear, yet. I'm very much looking forward to seeing how he approaches and presents the long view of learning.

The O'Reilly approach hits close to home for me as I spent a lot of time experimenting with material from the Portland Pattern Repository when I transitioned back into programmering from teaching and realized I didn't actually remember much! Anything that helps self-taught folks get better is teh win.

0 writebacks [/src] permanent link

Sat, 02 Feb 2008

AtomSub

So AtomPub is a reasonable way to publish things, etc. Would be nice to push an AtomPub endpoint to a service as a callback for events. An awfully large number of things can accept HTTP now, and there is a reasonable basic-operation system available, so why not take advantage for callback APIs? Instead of polling a site for updates, post a subscription with an AtomPub endpoint as the entry and let the service push to you. AtomSub :-)

3 writebacks [/src] permanent link

Sat, 26 Jan 2008

an interesting milestone: mod_slow

Crossed some kind of threshold today, I am sure. I needed a quick'n'dirty web server hack so broke out C for an apache module! What is happening to me?!

Basically, I needed something to put behind a proxy to do some load and capacity testing of the proxy. As I wanted to have things like the size of the response and time of the response be easily configurable on the load generator I needed to hack something up...

#include "httpd.h"
#include "http_config.h"
#include "http_protocol.h"
#include "ap_config.h"
#include "apr_time.h"
#include "apr_strings.h"

static int handler(request_rec *r)
{
    if (r->args)
        apr_sleep(apr_atoi64(r->args) * 1000);
    return DECLINED;
}

static void register_hooks(apr_pool_t *p)
{
    ap_hook_handler(handler, NULL, NULL, APR_HOOK_MIDDLE);
}

module AP_MODULE_DECLARE_DATA slow_module = {
    STANDARD20_MODULE_STUFF, 
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    register_hooks
};

This very nicely lets me drop artificial slowdowns in front of the the default handler (serve up files) so I can control "processing time" and file size (pick the file with the size I want): http://binky/big.html?2000 Sweet! Am kind of floored that the first solution which leapt to mind for me was an apache module in C, though!

For some reason, putting the sleep in fixups doubled the sleep time, so I made it a declined handler and things worked fine. Need to figure out why.... someday.

2 writebacks [/src/apache] permanent link

Brian's Waste of Time