Brian McCallister

Wed, 01 Sep 2004

A LtU...

A programmer may make a wrong language choice (i.e, not use Haskell for everything) without proving to be irrational...

:o)

My last post on graph paging continued to confuse (con-fuse not boggle -- I really need a better word) two ideas somewhat. Let's see if I can do better =)

To go back to programming kindergarten: we have four typical operations on persistent state: create, read, update, and delete. Three of these typically involve working with a small object graph: create, update, delete. They manipulate and mutate, and tweak, and work with, and generally implement confusing business so-called rules. The final one, read, has a big split.

The type of read op is the presentation of small data: username, birthday, shopping cart contents. The second type of read op handles huge volumes of data. These are almost always done via named queries because they would break o/r graph mapping tools, or rather, break the jvm by generating the lovable java.lang.OutOfMemoryError if two got executed concurrently somehow.

The first type of op, the small graph op, is mostly satisfied by the current crop of o/r m tools. The second set ranges from not too bad (OJB's persistence broker) to annoying (Hibernate's session) to not actually practicable (EJB CMP (they invented the "fast lane reader" for this one)). The strongest query language I know of for this is probably HQL (hey, I love OJB, and generally prefer it to Hibernate as it is lighter weight/lower level in my preferred form), but HQL is a very nice query language =). It isn't perfect, but it is probably the most useful one we have right now. Oddly enough, while it is optimized (language constructs) for writing reporting queries (lots of results), it is tied to a small-graph manipulation library (Hibernate).

So, this is completely different from the aforementioned previous post. It doesn't touch on any of the ideas. The reason is that I think these two very different beasts should probably be seperated, or at least handled very differently. Small graph manipulation is, I strongly suspect, much better served via a graph-paging system. Reporting is best served, I am completely convinced, by a result stream.

The nice part here is that you should be able to re-use huge swathes of the code =)

The closest solution I have right now is probably using OJB's persistence broker (report query by iterator) for reporting and the OTM or ODMG (ick) for graph manipulation (this will change in 1.1, and already has in CVS, where full object-transaction graph manipulation is available when wanted from the PB and you can use the same client interface for high level and low level ops :o). Using iBatis with Hibernate also seems very popular, and works pretty well (high volume reports go through iBatis, graph manipulation through Hibernate).

Speaking of large result sets, I also really want to be able to pipeline these puppies. A callback based reader which grabs non-object-transactional instances (which are immediately recycled after the callback returns to help with memory thrashing) would be handy which can be the end of a pass-through right from a streaming result set. Luckily, I don't have to deal with multi-gigabyte result sets anymore (sometimes I miss it, though, unusual constraints are fun to work with).

2 writebacks [/src/java/ojb] permanent link

Pie Decorators!

Guido made up (almost) for laying the smack down on tail-call optimization by not accepting J2 in place of pie, in my mind =) I know adding punctuation with meaning is not the way they usually go, but this is a good one, as Guido explains better than I could.

Is that cryptic enough? I really love the way decorators are being done there, though.

0 writebacks [/src/python] permanent link

Graph Paging

Playing with JDO 2 fetch groups, ZODB, thinking about TranQL (for two months now), playing with Prevayler, and looking at TORPEDO (need to run OJB against it when I have a chance) something clicked for me which I think clicked for some other people a long time ago -- but somehow got lost in the hullabaloo. We may all be doing O/R mapping wrong. Seriously, we probably are.

The current popular approach is a thin wrapper arounnd JDBC. It is what OJB, Hibernate, and JPOX all do. I cannot comment on Kodo and Toplink as I cannot go browse around their sources, but I suspect it is the same. This is how we are used to thinking about it -- the objects you get are basically a stream (or collection) of database results.

This isn't really what they are though. They are really closer to a swapped in page of the entire object graph. The query mechanism for the object graph, and query mechanism for the backend get confused (in the con-fuse sense). The JDO spec has the right idea in seperating object queries from persistence store queries (I do tend to agree with Gavin King that the JDOQL query language itself is somewhat less than elegant). The editing context can contain more or less than has been queried for, as long as what is accessed is available when it is needed.

When you need to obtain a handle on an instance, a query language is bloody useful. OGNL defines a better object query language than either OQL, JDOQL or HSQL, though -- if you are talking purely objects. HSQL evolved as it did to avoid the loss inherent in this abstraction though, and works nicely. You are querying into the editing context though, and the context can determine, seperately form the exact query, what it does not already have loaded (thank you Jeremy and Dain). This is a lot of work probably best done in a haskell style language optimized for doing fun maths rather than pushing bits.

Once you are maintaining graph pages instead of flat contexts, and issueing queries against the page rather than the backend, you can do nice things like absurdly optimize your queries into the backend (query the backend specifically for the disjunction of the predicate for the current query and the union of all predicates known to be in the current page (thank you, again, Jeremy and Dain)). The paging system certainly knows about the database, and needs to be able to write extremely optimized code (sql) to pull data out of it, but the client of the paging system really is better off being able to describe queries in terms of object behaviors.

Providing hinting about what objects are going to be needed, rather than how to pull them from the rdbms (hinting is what you are really doing when you ask postgres (only rdbms whose internals I have poked at much, Oracle's not being available to me) to use a join, unless you do a lot of configuration to make it not so) becomes a lot more useful as you can express the same intention in a way that lets the system know what you want, rather than flat out telling it. A perfect example of an optimization that would be tough to do by hand here is to stream elements in a collection down the join chain from the primary queried entities rather than pulling themin the initial join. In HSQL you would join them as you *know* you will need them, but what you really know is that the JSP needs them for rendering a while in the future, and on a different jvm. A mechanism to supply hints that these things will be needed, and will be needed as a one-pass stream (this may be too low level) when they get serialized out allows for much better actual throughput. The best way to provide this type of hinting would be hard to work out, but fun as heck to do -- and worth it!

This type of throughput-oriented hinting is hard to do through any existing o/r mapper I know of. It is not difficult to describe, however. It really begs for a flexible object query language. You can get the equivalent type of behavior in JDBC right now, but not in a useful way to OJB or Hibernate at least. This is just one example, you can use your imagination for others =)

The big problem here is that what I am talking about is most of a dbms. You need to handle snapshotting for transactions, dirtying predicates, etc. It just uses a relational database for its actual backend. In theory EJB's were designed to be able to do this, but I don't think any of them actually do. Is the problem just too hard? I have a lot of trouble believing that -- if you can formulate the questions correctly, you can pretty much build the solution. It just ain't easy to do -- and easy is seductive. Yea hard problems!

This is also a big abstraction -- and one that bets it can provide the correct knobs to allow the programmer to dive through it when needed. There is risk in a big abstraction, but then again, there are reasons we use Ruby, er, I mean Java, instead of assembly ;-)

This is a big hunk of code, and dives into math instead of simple bit-pushing, making it fun code! Definately outside the scope of my (one person) spare time programming, unfortunately =(

8 writebacks [/src/java/ojb] permanent link

Brian's Waste of Time