Brian's Waste of Time

Sat, 11 Sep 2004

Using Lucene with OJB

Search is important! All too often search looks like where thing like '%that%'. Users know google, and quite a few even know its query language at this point. Aside from wanting to provide more functionality in search, users are expecting it. Google seems simple, doesn't it?

Enter Lucene. I'll presume you've heard of it at least, if not used it. Lucene does full text indexing, and that is it. It does this really well. The beauty (well, one) is that you can index anything. In this case, I'll index an object being persisted by OJB. The key is to embed information required to retrieve the document being indexed.

Take a gander at a fairly simple Student class (this is frmo an app I am doing for my little brother, who is a professor (of such terrible subjects as rock climbing and white water kayaking, don't get me started)).

The primary use case for this application is for a student coop employee to be finding a student in the system, then finding gear and checking the gear out for the student. Finding the student is key, and that is best served by... searching! So we have a database record for each student, and want to have a convenient search facility, which can search based on name, student id (idNumber), phone number, even address. Lucene makes this is a snap. To do it, we just store the id (internal/pk id) in an unindexed field when we add a student in the StudentIndexer:

    public void add(final Student student) throws ServiceException {
        final Document doc = new Document();
        doc.add(Field.Text(NAME, student.getName()));
        doc.add(Field.Text(ID_NUMBER, student.getIdNumber()));
        doc.add(Field.Text(ADDRESS, student.getAddress()));
        doc.add(Field.Text(PHONE, student.getPhone()));
        doc.add(Field.UnIndexed(IDENTITY, student.getId().toString()));
        try {
            synchronized (mutex) {
                final IndexWriter writer = new IndexWriter(index, analyzer, false);
                writer.addDocument(doc);
                writer.optimize();
                writer.close();
            }
        }
        catch (IOException e) {
            throw new ServiceException("Unable to index student", e);
        }
    }

Notice the UnIndexed field on the Document? This tells Lucene to store this field with the record, but don't index it or search on it. When you retrieve the document you will get the field though. Perfect place to stash the primary key.

When we look for the students, we don't want to get back Lucene Document instances, though, we want to go ahead and get the nice domain model instances of Student. What we'll do is query against the index, pull all the pk's for the hits out, then select for the domain objects using those pks (from the StudentIndex:

    public List findStudents(final String search) throws ServiceException {
        return this.findStudents(search, Integer.MAX_VALUE);
    }

    public List findStudents(final String search, final int numberOfResults) throws ServiceException {
        final Query query;
        try {
            query = QueryParser.parse(search, StudentIndexer.NAME, analyzer);
        }
        catch (ParseException e) {
            throw new ServiceException("Unable to make any sense of the query", e);
        }
        final ArrayList ids = new ArrayList();
        try {
            final IndexReader reader = IndexReader.open(index);
            final IndexSearcher searcher = new IndexSearcher(reader);
            final Hits hits = searcher.search(query);
            for (int i = 0; i != hits.length() && i != numberOfResults; ++i) {
                final Document doc = hits.doc(i);
                ids.add(new Integer(doc.getField(StudentIndexer.IDENTITY).stringValue()));
            }
            searcher.close();
            reader.close();
        }
        catch (IOException e) {
            throw new ServiceException("Error while reading student data from index", e);
        }
        final List students = dao.findStudentsWithIdsIn(ids);
        Collections.sort(students, new Comparator() {
            public int compare(final Object o1, final Object o2) {
                final Integer id_1 = ((Student) o1).getId();
                final Integer id_2 = ((Student) o1).getId();
                for (int i = 0; i != ids.size(); i++) {
                    final Integer integer = (Integer) ids.get(i);
                    if (integer.equals(id_1)) {
                        return -1;
                    }
                    if (integer.equals(id_2)) {
                        return 1;
                    }
                }
                return 0;
            }
        });
        return students;
    }

The findStudents(string, string, int): List method is a little bit more complex than I like as it does a few things: query against the lucene index, extract the primary keys for the hits, query for the students matching those pk's (via the StudentDAO), and finally sorts the results (no way to specify the sort order in the query, it is dependent on the order of the hits from the lucene query). With that though, we support queries such as Tiffany, which is simple, or a more fun one, name: Aching phone: ???-1234 or what not. Go look at the Lucene query parser syntax. It is worth noting that the above query defaults to searching on the name field if no specific field is specified. This seems to make sense to me =)

If you look at the StudentIndex and StudentIndexer you will see there are also facilities for adding and removing documents from the lucene index. This gets important on any insert/update/delete operation. The update is important to catch as you need to remove the old entry and insert a new one in the index. Doing this is best done (my opinion) via an aspect which picks these operations out. That is outside the scope of this article though ;-)

For a larger application with more things being indexed (this just has two searchable domain types) I might generalize the search capability via a DocumentFactory such as:

public class BeanDocumentFactory implements DocumentFactory {
    public Document build(Object entity) {
        final Document document = new Document();
        try {
            final BeanInfo info = Introspector.getBeanInfo(entity.getClass());
            final PropertyDescriptor[] props = info.getPropertyDescriptors();
            for (int i = 0; i != props.length; ++i) {
                final PropertyDescriptor prop = props[i];
                final String name = prop.getName();
                final Method reader = prop.getReadMethod();
                final Object value = reader.invoke(entity, new Object[]{});
                final Field field = Field.Text(name, String.valueOf(value));
                document.add(field);
            }
        }
        catch (Exception e) {
            throw new RuntimeException("Handle these in real application", e);
        }
        return document;
    }
}

But I have not needed to generalize it for a real project yet =)

Speaking of Lucene (which rocks) I am eagerly anticipating Erik Hatcher's new book, Lucene in Action. If it is anything like Erik and and Steve Loughran's Java Development with Ant Lucene will be a lucky project to have it in circulation.

9 writebacks [/src/java/ojb] permanent link

Thu, 02 Sep 2004

Graph Paging =~ Caching (sort of, but not really)

Christian Bauer called my spade a spade on Dion's blog. He pointed out that I am describing caching behavior, which is true, but it is a cache in the sense that an executing application is an in-memory cache of the binaries on disk.

A better description of that I want is to implement an in-memory database (optimized for storing Java objects) backed by a remote database. I know of at least one effort to do this for relational databases in general, but that is a bigger problem than what I wanted to address, which is simply the same idea for java objects instead of arbitrary relational data (smaller problem set is nice).

There are a few things you would need in order to see a big performance gain. The first is that to see the big performance gains you would need to have the objects capable of participating in optimistic transactions. If they couldn't be optimistic, it would always have to pass through to the backing database in order to maintain transaction semantics there. Luckily, most things can participate in optimistic transactions. The ones that cannot simply always need to pass through.

1 writebacks [/src/java/ojb] permanent link

Wed, 01 Sep 2004

Graph Manipulation vs Reporting

My last post on graph paging continued to confuse (con-fuse not boggle -- I really need a better word) two ideas somewhat. Let's see if I can do better =)

To go back to programming kindergarten: we have four typical operations on persistent state: create, read, update, and delete. Three of these typically involve working with a small object graph: create, update, delete. They manipulate and mutate, and tweak, and work with, and generally implement confusing business so-called rules. The final one, read, has a big split.

The type of read op is the presentation of small data: username, birthday, shopping cart contents. The second type of read op handles huge volumes of data. These are almost always done via named queries because they would break o/r graph mapping tools, or rather, break the jvm by generating the lovable java.lang.OutOfMemoryError if two got executed concurrently somehow.

The first type of op, the small graph op, is mostly satisfied by the current crop of o/r m tools. The second set ranges from not too bad (OJB's persistence broker) to annoying (Hibernate's session) to not actually practicable (EJB CMP (they invented the "fast lane reader" for this one)). The strongest query language I know of for this is probably HQL (hey, I love OJB, and generally prefer it to Hibernate as it is lighter weight/lower level in my preferred form), but HQL is a very nice query language =). It isn't perfect, but it is probably the most useful one we have right now. Oddly enough, while it is optimized (language constructs) for writing reporting queries (lots of results), it is tied to a small-graph manipulation library (Hibernate).

So, this is completely different from the aforementioned previous post. It doesn't touch on any of the ideas. The reason is that I think these two very different beasts should probably be seperated, or at least handled very differently. Small graph manipulation is, I strongly suspect, much better served via a graph-paging system. Reporting is best served, I am completely convinced, by a result stream.

The nice part here is that you should be able to re-use huge swathes of the code =)

The closest solution I have right now is probably using OJB's persistence broker (report query by iterator) for reporting and the OTM or ODMG (ick) for graph manipulation (this will change in 1.1, and already has in CVS, where full object-transaction graph manipulation is available when wanted from the PB and you can use the same client interface for high level and low level ops :o). Using iBatis with Hibernate also seems very popular, and works pretty well (high volume reports go through iBatis, graph manipulation through Hibernate).

Speaking of large result sets, I also really want to be able to pipeline these puppies. A callback based reader which grabs non-object-transactional instances (which are immediately recycled after the callback returns to help with memory thrashing) would be handy which can be the end of a pass-through right from a streaming result set. Luckily, I don't have to deal with multi-gigabyte result sets anymore (sometimes I miss it, though, unusual constraints are fun to work with).

2 writebacks [/src/java/ojb] permanent link

Graph Paging

Playing with JDO 2 fetch groups, ZODB, thinking about TranQL (for two months now), playing with Prevayler, and looking at TORPEDO (need to run OJB against it when I have a chance) something clicked for me which I think clicked for some other people a long time ago -- but somehow got lost in the hullabaloo. We may all be doing O/R mapping wrong. Seriously, we probably are.

The current popular approach is a thin wrapper arounnd JDBC. It is what OJB, Hibernate, and JPOX all do. I cannot comment on Kodo and Toplink as I cannot go browse around their sources, but I suspect it is the same. This is how we are used to thinking about it -- the objects you get are basically a stream (or collection) of database results.

This isn't really what they are though. They are really closer to a swapped in page of the entire object graph. The query mechanism for the object graph, and query mechanism for the backend get confused (in the con-fuse sense). The JDO spec has the right idea in seperating object queries from persistence store queries (I do tend to agree with Gavin King that the JDOQL query language itself is somewhat less than elegant). The editing context can contain more or less than has been queried for, as long as what is accessed is available when it is needed.

When you need to obtain a handle on an instance, a query language is bloody useful. OGNL defines a better object query language than either OQL, JDOQL or HSQL, though -- if you are talking purely objects. HSQL evolved as it did to avoid the loss inherent in this abstraction though, and works nicely. You are querying into the editing context though, and the context can determine, seperately form the exact query, what it does not already have loaded (thank you Jeremy and Dain). This is a lot of work probably best done in a haskell style language optimized for doing fun maths rather than pushing bits.

Once you are maintaining graph pages instead of flat contexts, and issueing queries against the page rather than the backend, you can do nice things like absurdly optimize your queries into the backend (query the backend specifically for the disjunction of the predicate for the current query and the union of all predicates known to be in the current page (thank you, again, Jeremy and Dain)). The paging system certainly knows about the database, and needs to be able to write extremely optimized code (sql) to pull data out of it, but the client of the paging system really is better off being able to describe queries in terms of object behaviors.

Providing hinting about what objects are going to be needed, rather than how to pull them from the rdbms (hinting is what you are really doing when you ask postgres (only rdbms whose internals I have poked at much, Oracle's not being available to me) to use a join, unless you do a lot of configuration to make it not so) becomes a lot more useful as you can express the same intention in a way that lets the system know what you want, rather than flat out telling it. A perfect example of an optimization that would be tough to do by hand here is to stream elements in a collection down the join chain from the primary queried entities rather than pulling themin the initial join. In HSQL you would join them as you *know* you will need them, but what you really know is that the JSP needs them for rendering a while in the future, and on a different jvm. A mechanism to supply hints that these things will be needed, and will be needed as a one-pass stream (this may be too low level) when they get serialized out allows for much better actual throughput. The best way to provide this type of hinting would be hard to work out, but fun as heck to do -- and worth it!

This type of throughput-oriented hinting is hard to do through any existing o/r mapper I know of. It is not difficult to describe, however. It really begs for a flexible object query language. You can get the equivalent type of behavior in JDBC right now, but not in a useful way to OJB or Hibernate at least. This is just one example, you can use your imagination for others =)

The big problem here is that what I am talking about is most of a dbms. You need to handle snapshotting for transactions, dirtying predicates, etc. It just uses a relational database for its actual backend. In theory EJB's were designed to be able to do this, but I don't think any of them actually do. Is the problem just too hard? I have a lot of trouble believing that -- if you can formulate the questions correctly, you can pretty much build the solution. It just ain't easy to do -- and easy is seductive. Yea hard problems!

This is also a big abstraction -- and one that bets it can provide the correct knobs to allow the programmer to dive through it when needed. There is risk in a big abstraction, but then again, there are reasons we use Ruby, er, I mean Java, instead of assembly ;-)

This is a big hunk of code, and dives into math instead of simple bit-pushing, making it fun code! Definately outside the scope of my (one person) spare time programming, unfortunately =(

8 writebacks [/src/java/ojb] permanent link

Mon, 09 Aug 2004

Algorithmic Fun-ness

I spent a chunk of time on Saturday cleaning up grafolia (and importing into ASF CVS) and plunking in the hooks between it and OJB to let grafolia manage OJB's object states. It works really nicely. One of the big goals (personally) for me was to support statement reordering in a robust way. This has been fun =)

Topological sorting is fun =) I haven't been exposed to this particular problem before (though it smells like other directed graph problems), so am trying to come up with a not-too-inefficient algorithm on my own, before I go research the best fit i can find for my class of problem.

If you want to try, a basic comparator function might look like:

public static class TopologicalComparator
{
    public static final int BEFORE = -1;
    public static final int AFTER = 1;
    public static final int EQUAL = 0;
    public static final int NO_COMPARISON = Integer.MAX_VALUE;

    public int compare(final Change changeOne, final Change changeTwo) {
        /* ... */
    }
}

That said, the comparator function is probably the wrong apporach and building a directed graph of dependencies and traversing naively will probably yield an easier-to-follow (maintain) algorithm. Wheee, fun fun.

8 writebacks [/src/java/ojb] permanent link

Thu, 08 Jul 2004

OJB 1.1 Fun -- PersistenceAware

With OJB 1.0 finally released there has been tons of activity on cvs HEAD (OJB 1.1) for OJB. Most of my work thus far has been to beef up the capabilities of the most functional OJB client API -- the PersistenceBroker. Added change detection (via listeners) which works pretty nicely: it will do a copy-fields-on-read and compare-on-commit for classes which are not aware of persistence. This is basically the OTM and ODMG behavior, Hibernate behavior, TopLink behavior etc. It requires some small amount of overhead (basically 2x memory usage, and a fieldwise compare on the whole object graph on commit). The more fun one is that when classes are willing to maintain their own persistent state this overhead disappears =)

The contract for maintaining your own state is pretty simple: you'll be given a reporter to which you can report state changes (field gets dirtied, etc), and need to be able to restore your values on rollback when signalled. That's it. Here is a Beer which knows how to maintain its state.

import org.apache.ojb.broker.dirty.PersistenceAware;
import org.apache.ojb.broker.dirty.StateReporter;

public class SmartBeer extends DomesticBeer implements PersistenceAware {
    private StateReporter reporter = null;
    private Integer iq = new Integer(100);

    private Double oldPrice = null;
    private String oldBrand = null;
    private Integer oldIQ = null;

    public void setStateReporter(StateReporter reporter) {
        this.reporter = reporter;
    }

    public void restore() {
        if (oldIQ != null) this.iq = oldIQ;
        if (oldBrand != null) this.setBrand(oldBrand);
        if (oldPrice != null) this.setPrice(oldPrice.doubleValue());
        this.oldBrand = null;
        this.oldPrice = null;
        this.oldIQ = null;
    }

    public int getIQ() {
        return iq.intValue();
    }

    public void setIQ(int new_iq) {
        if (reporter != null && reporter.isTransactional()) {
            this.oldIQ = iq;
            reporter.makeDirty("iq");
        }
        this.iq = new Integer(new_iq);
    }

    public void setPrice(double price) {
        if (reporter != null && reporter.isTransactional()) {
            this.oldPrice = new Double(this.getPrice());
            reporter.makeDirty("price");
        }
        super.setPrice(price);
    }

    public void setBrand(String brand) {
        if (reporter != null && reporter.isTransactional()) {
            this.oldBrand = this.getBrand();
            reporter.makeDirty("brand");
        }
        super.setBrand(brand);
    }
}

Pretty gnarly, if straightforward. This smells of... crosscutting concern! Here is a less intelligent Beer

public class LessIntelligentBeer implements Beer {
    
    private String brand;
    private Double price;
    private Integer id;

    public String getBrand() {
        return brand;
    }

    public void setBrand(String b) {
        this.brand = b;
    }

    public double getPrice() {
        return price.doubleValue();
    }

    public void setPrice(double price) {
        this.price = new Double(price);
    }
}

This beer would need to use the less efficient option, except that we can go ahead and enhance it via a static aspect and helper class:

package org.skife.ojb;

import org.apache.ojb.broker.dirty.StateReporter;
import org.aspectj.lang.Signature;

import java.lang.reflect.Field;
import java.lang.reflect.Modifier;

public aspect PersistenceEnhancer {
    pointcut fieldSet(PersistenceEnhanced aware): set(* PersistenceEnhanced+.*) && target(aware);

    before(PersistenceEnhanced aware) :  fieldSet(aware) {
        try {
            Signature signature = thisJoinPoint.getSignature();
            Field field = signature.getDeclaringType().getDeclaredField(signature.getName());
            if (!Modifier.isTransient(field.getModifiers())) {
                field.setAccessible(true);
                StateReporter rep = aware.getStateReporter();
                if (rep != null && rep.isTransactional())
                {
                    rep.makeDirty(field.getName());
                    try {
                        aware.makeDirty(field, field.get(aware));
                    }
                    catch (IllegalAccessException e) {
                        throw new RuntimeException(e);
                    }
                }
            }
        } catch (NoSuchFieldException e) {
            throw new RuntimeException(e);
        }
    }
}

package org.skife.ojb;

import org.apache.ojb.broker.dirty.PersistenceAware;
import org.apache.ojb.broker.dirty.StateReporter;

import java.util.HashMap;
import java.util.Iterator;
import java.lang.reflect.Field;

public class PersistenceEnhanced implements PersistenceAware {
    private transient StateReporter reporter;
    private final HashMap oldValues = new HashMap();

    public void setStateReporter(StateReporter stateReporter) {
        this.reporter = stateReporter;
        oldValues.clear();
    }

    public StateReporter getStateReporter() {
        return reporter;
    }

    public void makeDirty(Field field, Object oldValue) {
        oldValues.put(field, oldValue);
    }

    public void restore() {
        for (Iterator itty = oldValues.keySet().iterator(); itty.hasNext();) {
            Field field = (Field) itty.next();
            try {
                field.setAccessible(true);
                field.set(this, oldValues.get(field));
            } catch (IllegalAccessException e) {
                throw new RuntimeException(e);
            }
        }
        oldValues.clear();
    }
}

This aspect simply defines behavior on PeristenceEnhanced derived classes to capture field changes, report dirties, and a facility for restoring dirties. Now we need one more aspect which gives our specific cases:

import org.skife.ojb.PersistenceEnhanced;

public aspect BeerEnhancer
{
    declare parents: LessIntelligentBeer extends PersistenceEnhanced;
}

And LessIntelligentBeer now has the same performance characteristics as carefully managed updates in the PB api, but the same state management behavior as the higher overhead ODMG or OTM api's, with no extra code (assuming you use these aspects). This only works against OJB's cvs HEAD, and the api's will likely change before release, but the idea is nice. The other key thing is that the behavior is the same against PersistenceAware classes and ones which are not -- you just get better performance (speed and memory) with copy-on-write and no need for comparisons on commit!

Special thanks to Jon Tirsen for sample code for finding Field instances from pointcuts.

1 writebacks [/src/java/ojb] permanent link

Wed, 30 Jun 2004

Apache Object/Relational Bridge 1.0 Released!

Woo hoo!. What else can I say?

0 writebacks [/src/java/ojb] permanent link

Mon, 14 Jun 2004

Object/Relational Bridge 1.0rc7 Last Night

Pushed the, hopefully, final release candidate for OJB 1.0 last night. Assuming no major problems are discovered we should release 1.0 next weekend!

We are all incredibly eager to make the 1.0 branch as we have pretty much had a feature freeze for way too long. Exciting things are coming as soon as we are allowed to add functionality and change apis =)

0 writebacks [/src/java/ojb] permanent link

Wed, 09 Jun 2004

Thoughts on JDO

With the promise from the EG that JDO 2.0 won't require bytecode enhancement, it gets a lot more exciting =) We're working on strong support for 1.0.1 in OJB (delayed for a year from the initial effort because, well, no one actually wanted it) and some recent discussions have fed some thoughts on it.

First, the query language. Gavin of Hibernate thinks it is abomination, and I completely agree that it is bizarre when you think you are querying a relational database -- but there is a key, you may not be. What JDOQL manages to do is define a query in terms of the Java object graph rather than the underlying data store. Yeah, O2 and its derivitives (OQL, HQL, EJBQL, etc) are more more elegant, but that is because functional weenies designed the original O2 ;-) JDOQL isn't as bad as I originally thought, and it might even grow on me to the point where I think in it instead of O2 for object queries.

Secondly, the client-side JDO api is bloody simple, and works well. The query facility is cleanly seperated from simple retrieval by identity (so if you really hate JDOQL you can pretty easily extend an impl to support different query systems), transactions are flexible and simply designed, and using it is straightforward. I like the client side, and apprently others do too -- most popular persistence systems work in the same way if you squint.

The only complicated part of JDO is the JDOQL queries (and I guess supporting the optional transaction semantics, but they are optional!) and as Thomas D. pieces together a good AST based parser and translater for OJB, I'm seeing how easy it would be to plug anything in as the back end. The JDO EG has, since the beginning, focused on persistence in general instead of persistence in relational databases. Sure rdbms's will remain the most common, but they are far from the only backend around.

With that in mind, I think I'd like to try to keep OJB's JDO implementation as free of OJB specific code (the PersistenceBroker or OTM backend) as possible -- the only thing that really needs exposure there is the query, the transaction, and the identity. If we can engineer it such that we can pass back the query AST (or accept a visitor and do AST traversal on the OJB-JDO side) and provide transaction and CRUD hooks it should be straightforward to use it as the basis to easily implement other backends. I know the original RI took this approach (you can plug OJB in as a backend to the RI thanks to some adapters Thomas M. did when the JDO spec was first released). In theory you could use this implementation to adapt XML:DB, Hibernate, or even Entity Beans pretty easily to use the JDO front end. The key is just the need for something to handle the state tracking, transactions-on-objects, and query parsing (you would still need to provide a visitor or be able to work with the AST to translate the query to your choice of backend though, so not a free lunch).

1 writebacks [/src/java/ojb] permanent link

Sun, 04 Apr 2004

Getting Rolling on Native JDO in OJB

I have recently been reminded of the importance of standards, and it has had a nice effect -- I got off my high horse and started implementing a real JDO interface to OJB. The core of it is in place now -- actually, has been for a while as Matt, Raghu, and Oleg did all the heavy lifting for the object transaction stuff in the OTM -- all I am doing is wiring it up, and adding a JDOQL interpreter, to work in a JDO compliant manner. It isn't actually very useful yet as filters on queries are ignored, though. Still need to do the fun part of efficiently translating JDOQL to OJB's query by criteria api. Should be pretty straightforward, I hope =)

The initial design, and I intend to try had to keep it this way, is to make the client side independent of the spi interfaces. You should be able to use OJB JDO without bytecode enhancement -- as long as you don't try to access the SPI directly (which you are not supposed to do anyway). If you do use a bytecode enhancer OJB will certainly take advantage of it for state management, etc, but then again, this will be equally usable by clients of any of the interfaces. It just means using different EditingContext and FieldAccessStrategy implementations in the OJB config.

Native JDO won't be in 1.0, but will be there for 1.1, and may make it into 1.0.x -- though don't hold your breath on 1.0.x. If 1.1 takes half as long as 1.0 has (I think we are approaching the Struts release schedule at this point, if we aren't slower already) I will definately backport the code though, if only as a download here or on SF.net.

The reason for finally getting off my butt to work on it ties directly to a conversation with Thomas Risberg, Dmitriy Kopylenko, and Bill Thompson after the most recent PhillyJUG meeting. We discussed data access in 10+ or 20+ year lifespan applications (ie, Java as Cobol). At that point any proprietary api for data access needs to go right out the window -- sure, I expect OJB to be around in ten years (community > code at Apache is for a reason), but I don't expect the interfaces to be compatible. I expect JDO to remain pretty compatible -- Sun and the JCP have shown excellent restraint in that regard (notice that Entity Beans aren't even deprecated yet!)

0 writebacks [/src/java/ojb] permanent link

Wed, 31 Mar 2004

Apache OJB PhillyJUG Slides

The OJB and Spring presentation at PhillyJUG seemed to go pretty well -- at least based on the questions during and discussions after. The slides are available for download now -- the references to "Thomas" are to Thomas Risberg, who who was doing the Spring presentation.

No one in the audience had used OJB but a couple people had used Hibernate, so I got a lot of "Hibernate works this way, how does OJB do it?" type questions. When I talked about designing applications for O/R mapping I wound up highlighting how things apply in the Hibernate world as well. Was sort of funny. I also learned that a lot of people are not writing multi-thread/multi-vm safe Hibernate apps right now as they tend to use the obtain-object, copy properties onto some web tier transfer thingie (like a dynabean), then copy off of the dynabean back onto the same object from the previous request, and re-persist that object. The problem here is that data that has changed gets overwritten, and worse, objects in collections can be re-inserted or deleted via a "most recent change wins" situation =(

Repeat after me three times -- start trasnaction, re-query for your entity, apply changes to the freshly obtained one, store it back (commit transaction). Do this all in the context of a transaction.

If you need to support long-lived transactions (they open page, walk away for five minutes, come back, submit form) and cannot risk concurrent modification use optimistic transactions and make sure to take the user to a conflict resolution page if the transaction fails (someone updated the data while they were filling out their form). It is almost always a bad idea to just blindly take the most recent submission and store it, particularly when it is an object graph where a lot of things are just loaded for ancillary reasons (ie, in the referenced collections).

2 writebacks [/src/java/ojb] permanent link

Sun, 18 Jan 2004

The Object Transaction Manager is Supported in OJB 1.0!

Happy day, we decided to support the OTM in OJB 1.0. This is a really big deal as the OTM provides object space transactions, including distributed object space transactions, in non-managed environments. It makes the question "which API should I use in OJB" very easy to answer -- the OTM.

The other big benefit is real support for truly transparent persistence -- no need for makePersistent(...) calls as persistence by reachability (reference or collection) is implemented, so elements created via a simple new and added to a collection on a persistent object will be inserted automagically, as will direct references (1:1). Woot!

This was all doable under the ODMG API previously, but the PersistenceBroker API, despite being a lower level API, is just much nicer to use most of the time -- particularly for programmatic construction, and the PB style query-by-criteria and query-by-example type queries are both supported directly on the OTM.

The biggest deal is that the PB API being supported on the OTM makes implementing other query mechanisms really easy (JDO will be built on it, for example) as the PB is pretty much designed for use in building higher level API's (the current ODMG implementation is built directly on the PB, for 1.1 the ODMG layer will be moved to run on top of the OTM).

Side effect -- I have a lot of docs to write on the OTM and a very short time to do it before we release as the last couple bugs are about wrapped, and the biggest (a dedicated lock server for distributed object locks) is in as of this morning thanks to the indefatiguable Thomas Mahler.

3 writebacks [/src/java/ojb] permanent link

Sun, 16 Nov 2003

Thinking About Queries

I got into a very fun discussion with James Strachan about Groovy Data Objects -- which are modeled more after ADO than JDO. GDO looks like

collection.select { where { b > 100 && b < 5000 } orderBy { ascending { [ b, c] } desceding { x } }

Or, for projections

collection.select { property b { get { bar.order.amount} set { bar.order.amount = it } } where { b > 100 && b < 5000 } orderBy { ascending { [ b, c] } desceding { x } }

The idea of a query as an closure is appealing. I may have to try to do a GDO style query-as-closure for OJB during the hackathon. Groovy provides access to the AST of GroovyObjects, so traversing that building a query-by-criteria shouldn't be terrible. Off to code I go.

0 writebacks [/src/java/ojb] permanent link

Fri, 10 Oct 2003

Apachge OJB OTM Layer Makes Unit of Work Pattern Scarily Easy

This is just because I have to shout for joy on the epiphany I just had for doing Unit of Work with the OTM layer of Apache OJB as the persistent storage back end. A tutorial is coming as soon as I have time but I need to go gut an application and rework it to use this new technique (so I have some sample code for the tutorial).

Raghu, Matt, everyone else who worked on OTM - thank you again!

0 writebacks [/src/java/ojb] permanent link

Tue, 30 Sep 2003

Apache OJB ODMG Tutorial Take One

Posted the first take on a new ODMG API tutorial for Apache OJB to CVS. For the Mapping PersistenceBroker tutorials I was on solid ground, but i am not much of an ODMG expert. Anyone who is, and doesn't mind contributing, I would much appreciate feedback on idiomatic usage of the ODMG API's.

This tutorial will probably require some more round trips to the editor before I am happy with it, but once it is good I can tackle the JDO tutorial and finally switch over the documentation links to the new tutorials. That will be nice.

2 writebacks [/src/java/ojb] permanent link