Brian McCallister

Sun, 15 Mar 2009

By lucky happenstance, the interfaces and idioms of jDBI work very nicely with JRuby's coercion from Ruby to Java.

require 'jruby'
require 'derby-10.4.2.0.jar'
require 'jdbi-2.2.2.jar'

dbi = org.skife.jdbi.v2.DBI.new("jdbc:derby:/tmp/woof;create=true")

dbi.withHandle do |h|
  h.createQuery("select name from woof").each do |rs|
    puts rs['name']
  end
end

It isn't perfect, I'd hoped the short form (rather than fluent form) of the handle interface would work like

h.execute "insert into woof (name) values (:name)", ["brian"]

but alas, it does not. JRuby isn't coercing the ruby array into a Java object array to match the function signature (String, Object[]). Still, they do play awfully nicely together!

2 writebacks [/src/ruby] permanent link

Fri, 27 Feb 2009

Proper Fib

I am tired of seeing really inefficient fibonacci sequence functions all over the place. I fear that someone might, someday, use one in a setting where it matters, therefore let me set the record straight with some proper fib examples!

Ruby

module Math
  PHI = (1 + Math.sqrt(5)) / 2
end
 
def fib n
  (((Math::PHI ** n) - ((1 - Math::PHI) ** n) ) / Math.sqrt(5)).to_i
end

Haskell

let fib n = 
        let phi = (1 + sqrt 5) / 2 in 
        round ((phi^n - (1 - phi)^n) / (sqrt 5))

Now I can go to bed :-)

6 writebacks [/src] permanent link

Sun, 15 Feb 2009

My Personal IT

I am asked questions about parts of this a lot, so I I figured I'd give a rundown of my personal IT setup. I think of it as a personal IT system as the sum is actually more than the parts, and when someone asks what I use for email, or keeping track of things, or phone numbers, or... well, whatever, my answer is always includes an explanation of how it what I use interacts with what else I use. I'll try to keep the explanations geared towards non-technical folks.

Email

At the center of your personal IT system is email. You need to have a primary email address on a domain you own with your mail stored somewhere secure, portable, and with plenty of storage. I use, and highly recommend, Google Apps, Premier. It costs $50/year (per person, which matters if you are setting it up for your family, for instance) and is worth every penny. The webmail interface is the best available, and you can use any email client (outlook, apple mail, thunderbird, etc) you like with it, if you prefer those.

The reasons for GMail (as part of google apps) are that Google is now hosting your data, backing it up, providing a good web based client, providing decent search on it, and providing an SLA on it. That they provide an SLA is a big deal, even if it isn't that great an SLA (99% uptime when I signed up). In addition to this (yes, there is more) other things I use integrate nicely with it, we will see these though, shortly.

Let me re-emphasize the need to own the domain your email is on. You need to be able to change where your email is hosted, or how it is handled at times. If your email is delivered to a domain you do not own, you are at the mercy of the owner to do it for you. In general, this means you will need to get a new email address. I am sure you have gone through the ritual of sending out "please use this new email address for me" messages, and have been on the receiving end of "does this email address still work?" messages. End it, buy a domain and use it for your email. I like gandi.net for handling domains purchases, personally. There are cheaper options, but you are not looking for cheap here, you are looking for reliable and trustworthy.

Calendaring

My wife finally got tired of being my calendar, so I started using the calendar part of Google Apps. It actually works really well, and, to be honest, I am glad I switched from having her remind me of things to using it. I haven't used as many calendaring tools as email tools, but of Notes, Exchange, Zimbra, and Google Calendar, Google Calendar is the clear winner for personal IT.

It hooks cleanly into GMail, which we are already using, supports multiple calendars, sharing (including full scheduing control, though I haven't managed to get my wife to take advantage of this yet), will send email and SMS reminders (SMS reninders are big for me), and other things I use hook into it nicely. Win!

There is the free version, but this comes as part of the Google Apps Premier Edition already mentioned, so I would just go ahead and make use of that one -- it will be all wired up for use with your email, so will make life easier, which is the point.

TODO Tracking

For a long time I just used a notebook and recopied my lists every day to the new day's page. I hated it, but the fact that I carried my notebook (my externalized memory as I put it, or his brain as my father calls his) everywhere decided this one for me.

When I got my iPhone, this changed. I tried out a variety of iPhone apps (there are probably more TODO variant iPhone apps than anything else), and settled on Appigo's Todo app (somewhere around $10). I actually liked Cultured Code's Things iPhone app more for entry and general use, but Todo won out because it syncs with Remember te Milk, which is the second half of my TODO setup.

Remember the Milk (RTM) is basically a web based TODO list app. I think RTM is much more complicated than it needs to be (it seems to fall prey to the Getting Things Done over-complicated-lists craze), but you can ignore three quarters of it, and it rocks. Again, the data is backed up by them, they provide a decent web interface, and it hooks nicely into Quicksilver (a desktop app for macs which is magical), and more importantly for me, Todo on my phone. You need a Pro account ($25/year) to sync.

The combination of good iphone interface, syncing, and web access are very important for me. As I have now joined the blackberry-wielding hordes who store their memories in their phone, I need to make sure I can get those memories back if the phone goes away, isn't working, or whatever. The web is pretty ubiquitous (and if the web goes way for an extended time, I have much bigger problems).

It bears mentioning that Remember the Milk has its own, free, iPhone app which is quite good and you could probably go with just that. The interactions with Todo (and actually, Things even more so) just roll off my fingers faster, so I use it. If Things adds the ability to sync with RTM I will reevaluate switching to it.

Remember the Milk has very nice integration with Google Calendar, though setting it up when you use Google Calendar On Your Own Domain (the premier edition setup I suggested) is not as smooth as it should be, yet. It also has a decent (basically usable) plugin which you can put in GMail, or a few other places.

IM and SMS

I lump these together, but they don't really belong together. I use SMS, it is handy. Not much more to say. I use Adium (on the mac) for IM, and Meebo for ad hoc (ie, someone else's computer) for IM over the web. I also use Meebo (via the web!) for IM on my phone. Meebo's iphone client is better than any of the regular iPhone IM apps I have tried. Both of these are free (yea!).

A nice thing to note, part of the aforementioned google apps is GTalk (Google's IM network, which is mostly compliant with a standard for IM called XMPP). This means you have a GTalk account with your email, which is handy. Making GTalk work with other XMPP things is generaly more work than it is worth, though, unfortunately.

In Conclusion

There are other things I use, but these really form the heart of my externalized brain at this point. The key pinciples which drove the setup were that I wanted everything to be internet accessible, externally hosted (so I cannot forget to make backups), and play well with the other pieces. I am very happy to pay for good service, and in the case of Google Apps, am glad I can (as they sometimes have "oops, we deleted your account" issues with the free ones (google is not alone in this, yahoo, hotmail, and everyone else I know of do this too with their free offerings at times).

4 writebacks [/stuff] permanent link

Wed, 28 Jan 2009

The New Fork, Visualized

This video by Ilya Grigorik nicely demonstrates the assertion that lowering a barrier to contribribution. You can see where they switched to git pretty clearly.

1 writebacks [/src] permanent link

Sat, 24 Jan 2009

The New Fork

David and Chris's discussion ( 1, 2, 3, 4, and 5) highlights a major shift going on in open source, right now. The shift: Forking is Good.

Historically, a fork has been seen as a sign of trouble in a project, and folks have gone to huge and painful means to prevent forks. Heck, not long ago the blogging rage was about which licenses helped prevent forks, BSD-style or GPL-style. Things are changing. It isn't really that new, though...

Aside from the big-obvious (Linux, which has encouraged forking, well, forever (in internet time)), even staid centralists like Apache talked about it as a good and natural thing. In that context it wasn't supposed to be a fork, though, it was supposed to be "a new version", and it stayed in the same source tree and was one big happy project.

At ApacheCon, in New Orleans, Dirk-Willem asked the key question, when discussing git, subversion, etc and the ASF -- "are we shaped by the tools, or do the tools shape us?" Many leapt to say that we shaped the tools, of course. It is, of course ;-), a blend. The tools shape the mindset which shapes us who shape the tools.

Forking is painful because merging changes between forks is painful. It is seen as a huge duplication of effort, a squandering of developer time, and a "I'm taking my ball and leaving, thbbft!" For some high profile things, such as X.org, this has been kind of true. Many times it can also e irreconilable goals (such as Dragonfly). Many times, however, it is just "I need something, but a little different, and the best path for me right now is to take something and incompatibly change it" (nginx, for example).

Git (and others, but whatever, I am talking about git, and hwile others support many-repos, only git really expects and encourages many repos) removes much the effort part from the forking equation. Github, in particular, as Chris makes clear, allows for very easy moving of code from fork to fork. Linus, a git I have never met, wrote software to support how Linux devlopment works (ie, with Linus as the SCM) automating what he did, and along the way opened the tooling door for everyone else.

Personally, I love the "please fork my stuff" mindset. I write open source stuff to solve my problem, and hey, if it solves yours too, bonus, have fun, don't sue me.. The bonuses add up, which is why I bother to do it in the open -- people do take it and run. I get patches, good feature requests, and whole ports of code I wrote and actively disowned a couple times a week. I always hate telling folks "I no longer maintain that, some guy named Chris, or maybe Anthony, took over, I think -- he hasn't made a release yet, though, so not sure what is up." If it is a bugfix someone sent me, I have done the spelunking to apply the fix and cut a release on code I expect to never use again (I take bugs personally), but aside from that it is "er, yeah, svn is here I think, let me know if you want commit."

There are also projects I use every day and are in a stable state (jdbi for example). It works great, I know other folks use it, but frankly, it is just a utility library. If you need changes, don't wait on me! Git, and something like github to make it even easier, means I can say "here is the code, fork it, go to town." A month later when I have spare time I can pull changes back, or if the forker really takes the ball and runs with it, I can just start using that code.

Where I think it doesn't work for people is when they have strong financial or emotional attachement to their code. If you make your living consulting on OpenWombat and uppity twenty year olds go and fork it all over, your job just got tougher. Worse, one of these forks may become more popular than OpenWombat and you won't now own it.

My personal opinion is to let the best code win, and the best code is a moving target. Chris's example, in the conversation with David, assumes the most active code line is the best code. Sometimes this is true, but more often (in my opinion) the more stable (not abandoned) code line is probably better. This is kind of Linux vs FreeBSD (and Linux clearly has won the popular vote) but... FreeBSD is solid as a brick in a way I doubt Linux will be for a long time yet, and stabilty trumps features for a lot of things. In a free-fork world, this difference of opinion is easily resolved, and the best features of the unstables can flow into the stables much more easily.

I am fascinated to see the shape of how this evolves. There are huge social and legal potholes ahead on the new fork, but in the end, as we have all proven, the best code does tend to win despite the efforts of trolls and zombies. Lowering the barriers to contributing and experimenting leads, more or less directly, to more development and experimentation, so I expect that the best code will tend to emerge from the network of forks which make sharing changes between them as easy as possible.

1 writebacks [/src] permanent link

Tue, 06 Jan 2009

Tokyo Tyrant is Awesome

If you are a hacker building a distributed system, drop what you are doing and go play with Tokyo Tyrant. I haven't been this excited about something since I first played with rails

I am serious, stop reading, start compiling.

ps: The tokyocabinet in macports is antique, build your own. Tokyo Tyrant needs lua to be installed in /usr/local, annoying, but survivable until patched to have a --with-lua.

1 writebacks [/src] permanent link

Sun, 28 Dec 2008

Lazy Web: Blog Software

I'm using a customized blosxom at the moment, and it has served me well (particularly the static generation part) but it has been increasingly grating on me. Enough so that the grating has contributed to my hiatus from blogging, and I want to remedy this. So, I am looking for what to replace it with.

I want to find something that represents posts as text, plain html is fine, I do that with blosxom, but some other structur that accomodates formatting and code snippets is also fine.

I want to be able to write incrementally, in emacs, and have in-progress stuff in a VCS -- git, svn, whatever, I don't care that much, but offline access to the whole shebang is worth preferring one of the dvcs varieties.. I would love for publishing to be just merging to a published branch.

It should generate static content -- that is static html, atom, whatever else it generates.

Comments are part of the content, and frankly, I want to be notified of them by IM or email. I really don't want to outsource comments either.

It needs to run on a pretty old, pretty low powered, kind of crufty unix like server.

I don't need to port any existing entries to it -- a nice thing about static text is that it doesn't go away -- everything on this incarnation can, and will, stay exactly where it is.

So, what is shiny?

3 writebacks [/stuff] permanent link

Mon, 15 Dec 2008

Real World Haskell, for Jon

A while back I told Jon that if he wrote something useful with Haskell I'd learn it. He proceeded to do something usful with it, so I have started working my way through the (excellent, so far) Real World Haskell.

So far, I like the language, but I haven't done anything useful with it.

2 writebacks [/src/haskell] permanent link

Mon, 08 Dec 2008

Why Apache?

People frequently ask why a project would want to move to Apache. The most recent case of this I have run across was in a thread on CouchDB's graduation, on Reddit. To answer the question, I'll take my apache hat off for a moment and put by consumer internet company architect (who codes as much as I can make the time for!) hat on.

The biggest benefit, to me, is that Apache provides a known way of doing things. We (remember, I have my $company_man hat on) don't use Couch right now (though we have certainly talked about it), but a major factor if we choose to use it is how the code gets developed, and how we can influence that in the direction that we need. With Apache, we know how it gets developed (in the open, all decisions on public mailing lists) and how to get involved (submit bugs and patches which will be discussed on the mailing list, if we need more keep submitting patches until they get tired of applying them and make me a committer, as committer keep working in the open, etc).

If you need to have influence over the project, say because you are creating a strategic dependency on it, you absolutely know that you can gain as much influence over an apache project as your competence allows. This is crucial, as the alternative is the willingness to maintain a fork if the developers go berserk, wander away, which happens. A major part of technology selection is balancing risks. It is not being totally risk averse, but it is being aware of the risks in critical dependencies and making the choice to accept the price if that risk converts into a liability. Having a guaranteed way to provide continuity to a project in the face of typical project killers, such as the project leader leaving the project, trumps merely having freedom to fork.

1 writebacks [/src] permanent link

Wed, 05 Nov 2008

Obama Wins!

Woot!

0 writebacks [/stuff] permanent link

Sat, 23 Aug 2008

Excellence

I was reminded recently that excellence lies in executing the basics perfectly, every time, much more than in executing the advanced adequately.

2 writebacks [/src] permanent link

Wed, 30 Jul 2008

Using Virtual Nodes to Compact Vector Clocks

One hiccup encountered when using vector clocks is that there is no inherent way of reducing the size of the clock. Basically, any node which acts as a causal agent of change has the potential to be forever recorded in the clock. This leads to unbounded clock size, with time. Most systems tend to have a limited number of causal, or lead, nodes providing clock values so it is avoided, but sometimes you don't have that.

When vector clocks are used to track causality in a storage system, such as in Amazon's Dynamo system, it becomes possible to create syncronization points in the history of the element, between storage nodes, if the storage nodes are able to form consensus between themselves on the value of an element at a specific point in the elements history. If we are talking an eventually consistent system, this can be done by using a background syncronization and merge algorithm which merges acausal changes in the background. Alternately, it could be client resolved, in systems like Dynamo, but that isn't my problem, so... I digress.

When the system believes it has a value at a given clock value, where the clock is causally related to the unified value on the other storage nodes holding the element, it can try to achieve concensus about this, and if successful, increment an artifical clock key which we'll call the epoch. If successful, the epoch value subsumes the vector clock values associated with the epoch in the element, shrinking the element's clock.

To run through an example, let's say we have a system which uses three storage nodes for each element. We don't care exactly how these elements values are assigned, except to recognize that it allows for non-causally related changes to occur. At a given point in time the storage nodes may have values for an element A, as follows:

Node	Clock
red	[red:2, green:1]
blue	[red:2, green:1, blue:2]
green	[red:3, green:1, blue:1]

A Paxos instance may be executed proposing that epoch:1 be [red:2, green:1]. As each node can agree that [red:2, green:1] comes before its value, it can accept the epoch value. Upon acceptance of the value, the clocks would become:

Node	Clock
red	[epoch:1]
blue	[epoch:1, blue:2]
green	[epoch:1, red:3, blue:1]

Assuming a background reconciliation protocol, a system could apply an appropriate heuristic to decide when to atempt to increment the epoch. A good example of such would be after unrelated values have been successfully merged. When it makes sense, and how to back-off to older clock values really depends on the characteristics of the system being designed and how it will be used.

As pointed out in the Dynamo paper, systems where there tend to be a small number of keys in the clock don't generally have this problem, Dynamo avoids it by causing the clock keys to be based on a small number of likely coordinator nodes:

To this end, Dynamo employs the following clock truncation scheme: Along with each (node, counter) pair, Dynamo stores a timestamp that indicates the last time the node updated the data item. When the number of (node, counter) pairs in the vector clock reaches a threshold (say 10), the oldest pair is removed from the clock. Clearly, this truncation scheme can lead to inefficiencies in reconciliation as the descendant relationships cannot be derived accurately. However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated.

In something like that, it may not make a lot of sense -- the problem just doesn't tend to come up. On the other hand, other systems, such as one which uses a user id, or session id, as a clock key would tend to generate larger clocks. This kind of keying can be useful for providing read-what-I-wrote consistency, but that is another discussion :-)

0 writebacks [/src] permanent link

Mon, 23 Jun 2008

Library Versioning, Redux

I am a big fan of the APR versioning guidelines, but there is an element they don't capture well somewhere between major (backwards incompat change) and minor (forward incompat change) in Java. If you follow the, generally recommended practice of exposing things via interfaces (pure virtual classes), you have opened the door for users to implement those interfaces.

In a C-style world, adding a function to a library would bump you from 1.6 to 1.7, using APR guidelines. In an interface-driven Java-style world, adding a method to an interface would bump you from 1.6 to 2.0. Or would it?

To take a concrete example, a coworker (thanks Jax!) recently re-added first class support for callable statements to jDBI. jDBI uses a Handle interface to expose operations against a database. It has gained a method:

public <ReturnType> Call<ReturnType> createCall(String callableSql, 
                                     CallableStatementMapper<ReturnType> mapper);

If you implement this interface, the change is backwards incompatible. An implementation of Handle made against 2.2.2 will not compile against this. On the other hand, the intent of the library is not for people to implement Handle, it is to expose the libraries functionality. It is almost a header file.

So, 2.3 or 3.0?

3 writebacks [/src/java] permanent link

Sat, 14 Jun 2008

yasnippet

I am quite liking yasnippet so far. I have converted my bloggie stuff over from textmate to it, and it works darned nicely. Snippet definition is easy, and clear. Woot!

0 writebacks [/emacs] permanent link

Thu, 15 May 2008

Topology Aware Consistency Policies

I am increasingly fascinated by consistency options, in a distributed storage system, made available by topology awareness on the client. For example, if you consider a write committed iff the write has been made to a majority of all storage nodes and a majority of the local nodes, where local would typically be "same datacenter," it allows you to achieve ~~repeatable read~~ read what you wrote consistency locally when a majority of local nodes have responded to a read request with a matching response, while still providing overall consistency across the entire system.

0 writebacks [/src] permanent link

Sat, 10 May 2008

Long Tail Treasure Trove Slides!

Gianugo has posted the slides from our JavaOne presentation, on Slideshare and in pdf form. The talk was awesome to give, we had a killer audience. A huge thank you to all who attended!

2 writebacks [/src/java] permanent link

Wed, 23 Apr 2008

The Shape of Async Callback APIs

When we have async callbacks in a Java API, the idiommatic way of writing the interface to register the callback looks like:

Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
})

I'd like to propose that we adopt a new idiom, which is to pass an Executor along with the listener:

Executor myExecutor = Executors.newSingleThreadExecutor();
// ...
Future<Foo> f = asyncEventThing.addListener(new Listener<Foo>() {
  public Foo onEvent(Event e) {
    return new Foo(e.getSomethingNifty());
  }
}, myExecutor);

The main benefit is that you give the caller control over the threading model for the callback. Right now, most libraries either have a separate thread pool for callbacks, or make the callback on the event generator thread. Usually there is nothing but an obscure reference on a wiki to indicate the behavior.

2 writebacks [/src/java] permanent link

Thu, 17 Apr 2008

My Favorite Bash Completion

As I got hit with a meme about command line stuff, I figured I'd share an update to my favorite bash completion:

SSH_COMPLETE=( $(cut -f1 -d' ' ~/.ssh/known_hosts |\
                 tr ',' '\n' |\
                 sort -u |\
                 grep -e '[:alpha:]') )
complete -o default -W "${SSH_COMPLETE[*]}" ssh

If you ssh directly to IP addresses very often, you might want to leave off the last grep -e.

Not going to tag anyone, but if you have a favorite completion, please share! (I suggest not in a comment on this post as my comment system does not preserve any formatting).

5 writebacks [/stuff] permanent link

That History Thing

bakert tagged me, so:

brianm@binky:~$ history | awk {'print $2'} | sort | uniq -c | sort -k1 -rn | head
 164 svn
  52 cd
  42 ssh
  32 sudo
  22 git
  16 ls
  16 for
  14 echo
  13 man
  10 curl
brianm@binky:~$

Sadly, I only seem to keep a 500 line .history -- need to fix that.

0 writebacks [/stuff] permanent link

Tue, 15 Apr 2008

If We Had to Drop Java

So, thought experiment. If we, as an industry, had to drop Java, the language and the virtual machine, for some reason, what could really move into its niche?

Some points to consider:

We'd want to be able to compile Java the Language to it in order to not have to rewrite everything.
If we are compiling Java the language to it, it should support Java semantics, or at least something close enough that you can get around it in an automatic manner.
We are addicted to GC now. Smalltalkers and Lispers, you were right.
We are primarily talking about a medium to big server footprint.
It needs to allow for efficient use of lots of processors.
It needs to support large teams of mediocre programmers.
It would be nice if it supported small teams of kickass programmers.
It should be legally "safe" -- no one should be able to yank the carpet out from under it via IP evildoing. The FSF nailed this one.
The existing culture would need to be able to cope with corporate adoption, or the technology would have to be easy enough to fork and use that the existing culture could be ignored.

Putting aside the "damn I want to use coolness X," what out there provides something that could do it?

4 writebacks [/src] permanent link

Brian's Waste of Time

Email

Calendaring

TODO Tracking

IM and SMS

In Conclusion