Brian's Waste of Time

Wed, 01 Sep 2004

Graph Paging

Playing with JDO 2 fetch groups, ZODB, thinking about TranQL (for two months now), playing with Prevayler, and looking at TORPEDO (need to run OJB against it when I have a chance) something clicked for me which I think clicked for some other people a long time ago -- but somehow got lost in the hullabaloo. We may all be doing O/R mapping wrong. Seriously, we probably are.

The current popular approach is a thin wrapper arounnd JDBC. It is what OJB, Hibernate, and JPOX all do. I cannot comment on Kodo and Toplink as I cannot go browse around their sources, but I suspect it is the same. This is how we are used to thinking about it -- the objects you get are basically a stream (or collection) of database results.

This isn't really what they are though. They are really closer to a swapped in page of the entire object graph. The query mechanism for the object graph, and query mechanism for the backend get confused (in the con-fuse sense). The JDO spec has the right idea in seperating object queries from persistence store queries (I do tend to agree with Gavin King that the JDOQL query language itself is somewhat less than elegant). The editing context can contain more or less than has been queried for, as long as what is accessed is available when it is needed.

When you need to obtain a handle on an instance, a query language is bloody useful. OGNL defines a better object query language than either OQL, JDOQL or HSQL, though -- if you are talking purely objects. HSQL evolved as it did to avoid the loss inherent in this abstraction though, and works nicely. You are querying into the editing context though, and the context can determine, seperately form the exact query, what it does not already have loaded (thank you Jeremy and Dain). This is a lot of work probably best done in a haskell style language optimized for doing fun maths rather than pushing bits.

Once you are maintaining graph pages instead of flat contexts, and issueing queries against the page rather than the backend, you can do nice things like absurdly optimize your queries into the backend (query the backend specifically for the disjunction of the predicate for the current query and the union of all predicates known to be in the current page (thank you, again, Jeremy and Dain)). The paging system certainly knows about the database, and needs to be able to write extremely optimized code (sql) to pull data out of it, but the client of the paging system really is better off being able to describe queries in terms of object behaviors.

Providing hinting about what objects are going to be needed, rather than how to pull them from the rdbms (hinting is what you are really doing when you ask postgres (only rdbms whose internals I have poked at much, Oracle's not being available to me) to use a join, unless you do a lot of configuration to make it not so) becomes a lot more useful as you can express the same intention in a way that lets the system know what you want, rather than flat out telling it. A perfect example of an optimization that would be tough to do by hand here is to stream elements in a collection down the join chain from the primary queried entities rather than pulling themin the initial join. In HSQL you would join them as you *know* you will need them, but what you really know is that the JSP needs them for rendering a while in the future, and on a different jvm. A mechanism to supply hints that these things will be needed, and will be needed as a one-pass stream (this may be too low level) when they get serialized out allows for much better actual throughput. The best way to provide this type of hinting would be hard to work out, but fun as heck to do -- and worth it!

This type of throughput-oriented hinting is hard to do through any existing o/r mapper I know of. It is not difficult to describe, however. It really begs for a flexible object query language. You can get the equivalent type of behavior in JDBC right now, but not in a useful way to OJB or Hibernate at least. This is just one example, you can use your imagination for others =)

The big problem here is that what I am talking about is most of a dbms. You need to handle snapshotting for transactions, dirtying predicates, etc. It just uses a relational database for its actual backend. In theory EJB's were designed to be able to do this, but I don't think any of them actually do. Is the problem just too hard? I have a lot of trouble believing that -- if you can formulate the questions correctly, you can pretty much build the solution. It just ain't easy to do -- and easy is seductive. Yea hard problems!

This is also a big abstraction -- and one that bets it can provide the correct knobs to allow the programmer to dive through it when needed. There is risk in a big abstraction, but then again, there are reasons we use Ruby, er, I mean Java, instead of assembly ;-)

This is a big hunk of code, and dives into math instead of simple bit-pushing, making it fun code! Definately outside the scope of my (one person) spare time programming, unfortunately =(

8 writebacks [/src/java/ojb] permanent link