Brian's Waste of Time

Thu, 26 Apr 2007

H2 Performance... "Notes"

This needs to be highlighted. Thomas Mueller, the creator of H2, likes to point out the huge performance difference between H2 and other databases. It is extremely important to note that the main driver behind the performance difference is the fact that H2 explicitely forgoes durability. It does not sync data to disk on commit. Bears repeating, H2 does not sync data to disk on commit. You can force it to, in which case the performance hits the same bottleneck every other database hits, disk io:

-- 1024 bytes, 1000 times --
    derby:		2192 millis
    bdb:		1849 millis
    h2: 		2221 millis
    h2b:		2351 millis
-- 2048 bytes, 1000 times --
    derby:		2199 millis
    bdb:		2129 millis
    h2: 		2578 millis
    h2b:		2414 millis

The numbers here are running simple inserts and deletes by identity (so that I can include BDB JE). Shockingly, the numbers are all basically the same once you force H2 to sync to disk (executing a "CHECKPOINT SYNC" statement after every DML operation).

H2 is an awesome little database, but if you care about storing your data, please read the fine print where it notes that H2 is not ACID. It is just ACI.

writebacks...

The Anonymous Coward


This does not seem important in the most common case for embedded databases, which are read-only. If I wanted ACID I would go with Oracle anyway.

Brian McCallister

Re: Read only...
That is one use, but is quite far from the only.

Thomas Müller

ACI like all others
As it is written in the 'fine print', all other databases are not ACID as well (meaning, if you actually test it, the commits are also 'not always' persisted if you pull the plug just afterwards). So the performance comparison is not apples to oranges, it is apples to apples. The only difference is: H2 actually admits there is a problem with durability and documents it, while most other databases just ignore it more or less, or even claim they are ACID while they are not. PostgreSQL ignores it, in MySQL it is documented, HSQLDB works the same way as H2 (however 20 seconds delay by default), Derby people claim their database forces writes, but it does in fact not if you run a test... You can run the 'ACID' test yourself if you want to, the source code is included in H2 (org.h2.test.poweroff.Test as documented). The only solution (if you really need guaranteed durability for each commit) is: Use a cluster. H2 supports clustering.

Brian McCallister


Thomas, I think there is a big difference between writes being flushed to the level that the OS and disk controller allow versus a twenty second float. There are a lot of places where H2's model is great -- I (we) use it in a couple of those, but it is the opposite of what people *expect* from a database, and that distinction is important. It is 'fine print' when such a core distinction is an "Advanced Topic" at the bottom of the navigation :-) Your database rocks, and to do justice I should post the "default durability" numbers for the above benchmark (h2 thrashes everything else, predictably), but that would be apples to oranges as you put it. The comparison I did was as apples to apples as it gets -- each is pushing the data to the hardware to the level that the disk controller/OS allows.

Thomas Mueller


I understand I need to change the docs so that it is easier to understand the problem and the default behaviour of H2. The default delay for H2 is one second, by the way, for HSQLDB it is 20 seconds (AFAIK). The default for Derby is write without delay, but if you test it about 50% of the time (my tests, my hardware) a few transactions are lost when power is lost. PostgreSQL the same. The default for MySQL is call fsync (making MySQL very slow by default). In the benchmark, I have changed the setting for MySQL as documented (I think for PostgreSQL as well). Thomas

Fancois Orsini


Thomas, I disagree - A database system should do commit writes without delay (whether the write happens at the file system or not [raw devices or Direct I/O]. The quicker you write log records to disk, the lesser corruptions will likely to happen. It makes even more sense if your applications which embeds the DB crashes itself or the JVM - Derby writes without delay, so hence the I/O has been issued to the FS and Disk Controller, it will happen, if you have a delay, well it will NOT - This is pretty obvious. Yes, you could argue that running with a UPS I should not have any problems but it is not right as an application can always crash and having been able to flush log records because there is a delay in the case of H2 and HSQLDB.

Fancois Orsini


My last sentence should have read: "you could argue that running with a UPS I should not have any problems but it is not right as an application can always crash and NOT having been able to flush log records because there is a delay in the case of H2 and HSQLDB."

David Van Couvering


Also, losing a few transactions while maintaining a consistent database is different from ending up with a corrupted databases. I don't know about H2, but with HSQL, if you power down your machine while it's running, your DB can easily get corrupted (quoted to me by a user who experienced this regularly and moved to Derby).

Øystein Grøvlen


In order to ensure durability with any database system, caching in the disk controller needs to be turned off. If you lose transactions with PostgreSQL or Derby, I would suspect that you are running with the disk cache enabled. (That is default on most PCs).

Thomas Müller


Yes, disk cache is enabled on most systems, and additionally even if it is turned off, hard drives do not always write when asked to. See 'Your Hard Drive Lies to You' http://hardware.slashdot.org/article.pl?sid=05/05/13/0529252&tid=198&tid=128). So what is the point of 'commit writes without delay', if the hard drive then anyway doesn't do it (even if you switch off caching in the disk controller)? Not only it will be very slow if done right (by calling FileDescriptor.sync() or FileChannel.force() at each commit, maximum committed transactions per second is 30), or done half backed (sorry) like Derby and PostgreSQL (open the file using RandomAccessFile(.., "rws"/"rwd"): does not call fsync). If Derby and HSQLDB can get corrupted if the hard drive doesn't flush writes, that's a bug in those databases. H2 doesn't get corrupted in this case. The default for H2 is to delay commits one second (you can change that). So a bit more than one second of transactions may be lost by default for H2. For other databases, more (HSQLDB: 20 seconds) or less (Derby: sometimes none, sometimes one or two in my test) transactions are lost. A good way to make sure no transactions are lost is using a cluster. With a UPS you still have the 'single point of failure' problem. For the benchmark results posted on the H2 web site: For Derby, there is no option to change the delay as far as I know. The write delay for PostgreSQL and MySQL was changed to one second (so the benchmark is apples to apples for those databases). So the reason why H2 is fast is not the write delay.

comment...

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Spam Guard, translate l33t to English: (hint, it's an Australian animal, plural form)
Comments:
Save my Name and URL/Email for next time