Brian McCallister

Mon, 30 May 2005

Transparent Persistence, Transient and Detached Objects, and DTO's

Everything I'm going to say applies equally whether you are using OJB's ODMG or OTM layers, JDO (1 or 2), Hibernate, TopLink or someday (hopefully) JSR 220 Persistence. It's about so-called Transparent Persistence. There is a key concept which doesn't seem to be expressed clearly enough in places, so I'll take a stab using slightly different words then are typically used, and probably not make it any more clear =) If you find yourself saying "yes, that is true, but there is this case..." while reading, just keep reading. I am skipping some concepts initially in order to introduce them later. Bear with me, please.

Let's create the idea of a persistence context. A persistence context is like a virtual machine, or a dedicated heap. It is demarcated space (don't worry how, yet) in which persistence capable objects (to borrow a term from JDO -- basically an object which has been mapped to something in the datastore, and can be persisted) can exist and be related to the datastore. Groups of objects are said to be in the persistence context, though they can certainly exist outside of the persistence context. An given object which is persistence capable, and in fact represents something with the same identity (in the platonic sense) is called an entity. Each entity within a persistence context can be thought of as being bound to the underlying datastore (typically a relational database) to the collection of bits therein represented by the same identity. This is usually simply a row. These are called persistent entities. When a persistent entity has its state change, that same state in the datastore changes (yes, yes, I'll talk about write-behind later).

Now, entities can exist outside of a persistence context as well as within a persistence context. When entities exist outside of a persistence context, they are not considered to be bound to the underlying datasource. Generally speaking these entities can be called transient entities. That is an imprecise definition of transient, but it is the right one for right now. State changes on these entities are not reflected back to the datastore.

Now, in all these libraries (I consider them libraries, you call into them in general, rather then have them call into you) there is some central access point for working with persistence contexts. It may be called a EntityManager (JSR 200), a Transaction (ODMG), a Session (Hibernate), or PersistenceManager (JDO). Each one does the same thing. Each instance of the entity manager will have its own conceptual persistence context.

Now, how do objects get into that persistence context, you might ask? The most common way is because the entity manager which owns the persistence context is the one you used to retrieve them. Another way is to tell it to insert a new entity by instantiating a persistence capable object, and telling the entity manager to store it (ie, EntityManager#persist(Object)). Nice, easy.

Okay, lets complicate it some. Every time you call a setter, it would be really craptastic to have to issue a corresponding update to the database immediately. So we introduce the idea of transactions. These are not datasource transactions, though it is a bloody good idea almost all the time to synchronize them with datasource transactions. These are more like units of work. Basically all the changes made to entities in the persistence context are saved up until you tell it to save them out to the datastore (flushing). This is write-behind. It is good, though it does make debugging weird sometimes.

Okay, now once you have write-behind you have some additional states the persistence context needs to keep track of. A dirty entity has been changed, but not flushed, a clean entity has not been changed, a deleted entity has been deleted but the delete has not been flush, and a new entity has been inserted, but the insert hasn't happened on the database yet.

Just like you have different variations on the persistent state, we need to introduce some similar variations on the transient state. The first is to redefine transient to be a bit more precise. A transient instance lacks identity. It does not represent anything in the database. A variation on a transient instance is a detached instance -- this is an instance which has identity, but is not part of a persistence context. Changes made to detached instances are not reflected in the datastore. There is one last variation, hollow which is what happens when you have a persistence capable object has identity, but no state. It only really crops up in JDO where it is used to avoid all kinds of confusing things that lead to hard to track down bugs in Hibernate (this is not a jab, it is simply true, the hollow state may be good, may be bad -- hibernate gets a lot of use out of its lack of hollow).

Now, people hate staying in the lines, so they like to be able to do things like re-attach detached instances, and do all kinds of bizarre and kinky things with transient objects. Re-attaching a detached instance requires making a detached intance. Some libraries provide a means of explicitely detaching objects, others just do it by default to all persistent entities in a persistence context when the entity manager owning the persistence context closes its connection the datastore. They all (or all will with JDO 2) provide a means of re-attaching, with varying semantics on the details oof how it gets done inside the persistence system. The end result is *mostly* the same.

Okay, back to our way of looking at it as a persistence context. There are explicit ways of passing objects across persistence context boundaries via the entity manager -- inserts and re-attachement being the ways this happens. Now, frequently you attach, or insert, graphs of objects at a time, so when you call the relevent methods on the entity manager, you get the operation applied to the whole graph (or variously subsets of it, as with hibernate's different cascades). All well and good so far.

Let's poke the edges that burn people, using this way of thinking about it.

First, what happens if you have an entity in a persistence context with an object reference to something outside the context when it is flushed? Big fat error. Don't cross persistence context boundaries except through calls to the entity manager. ...

This has been sitting on my HD too long without being finished, so posting incomplete before I head out for honeymoon. Will eventually finish the thought processes, maybe =)

1 writebacks [/src/java] permanent link

Brian's Waste of Time