optimization – Dazed & Confused

Since I started working way back in the summer of 2004 I’ve been working with ORM technologies. First TopLink, and then Hibernate. I like ORM technology, but I also see the associated problems. They are incredibly complex products, and not having some resources that really understands what is happening will get you into serious trouble. Usually the problems surface with performance, other times, just strange results occur. Even if you are lucky enough to have experts at hand, the general understanding amongst your developers continues to be a problem.

I would love to have an easier alternative, but I don’t really see anything on the horizon that replaces it without removing too many of the benefits. But that’s really something for a different post.

Some basics

So if you’re doing ORM, and especially Hibernate there are some basics to keep your options open when it comes to performance:

Don’t do explicit flushing
Don’t disable lazy loading in your mappings
Don’t use Session.clear()

There are valid reasons for doing these things, but usually they are used because someone does not quite understand how Hibernate really works. And what’s even worse; they limit your choices later on when tuning and changing the solution. So if someone are experiencing problems and consider doing one of these things; make sure they talk to your main Hibernate guy first. I can guarantee you there is a lot of pain involved in removing these later on.

So if you’ve been a good boy or girl and avoided these pitfalls you should be set to do some performance optimisations in the areas reported as slow. Yeah, reported as slow. Don’t do any funky stuff before you know it is an area that needs improvement. Doing special tuning will limit your options later on, so only do it where it’s really necessary.

Diagnosing the problem and finding the culprit

YourKit Java Profiler is my absolute favourite for diagnosing problems related to Hibernate. It enables you to see all queries executed, and trace it back into the code to figure out why it is run.

Trond Isaksen from Zenior also held a talk at Capgemini last week where he talked about using stacktraces in core dumps for analyzing problems. It might actually be your only option, because introducing Yourkit in production will cause side effects.

The amount of information can become quite overwhelming, but learn these tools and you will have a trusty companion for diagnosing your database performance in Java for a long time to come.

Hibernate performance

The way Hibernate basically promises to deliver performance is through caching and changing the amount of data that is fetched each time. This works for most of the cases where you use Hibernate. If not, you might just need to do some good old SQL.

Understand 1st and 2nd level caching and figure out how you can tweak relations to change behaviour and you have a good tool set to tune Hibernate with.

Fixing issues

Once you find the reason for your performance problem, and if it’s database or Hibernate related, you basically have the following options. I try to follow this list top to bottom. The last solutions impact your code more and can also complicate your deployment and infrastructure.

Check that your database indices are tuned. To have effective fetches your indices must match your actual queries, so make sure they are correct.
Consider how you do key generation. If inserts are slow, it might be because it calls the Database for a key for each and every row it intends to insert. Change generators or assign yourself. Stuff like thr Sequence HiLo Generator can drastically reduce the number of queries Hibernate does to your database.
Fetch on primary key whenever possible. Hibernate has some default methods called get/load that lets you retrieve an object based on the primary key. These methods checks with the first level cache whether the object has been retrieved within the same Hibernate session, and if so avoids database communication. So if you use these you will only get one call to the database, even though your code actually calls Hibernate multiple times. Using Queries will bypass this mechanism, even though you query on primary key.
Enable second level cache for Read Only entities. This is a really good quick win for stuff like Currency or Country. Close to zero cost.
Consider wether you are always using a set of objects at the same time. You rarely retreive an Order without looking at the underlying Items in the Order. Setting fetch=”join|select|subselect” or batch size on the relation can increase the speed. Note that this will then happen every time you fetch the Order. It will also effectively bypass any caches you have enabled, so make sure you consider all the usage scenarios for this.
Write custom queries for the situation. Setting the fetch mode in an association as in the previous section will impact every fetch of the Order object. If there’s only a few cases where the performance gets really bad and that is separated from other parts of the system, you can write a custom query. This enable you to tune the fetching to the concrete case and let other parts of the system actually benefit from lazy loading. This is preferably custom Hibernate Criteria, but can also be HQL or even SQL.
Use plain old SQL. There is actually things that SQL is better at. Use it, and use something like the RowMapper feature in Spring with it.
Refactor your code to enable better performance. Changes in the model, or the design of services and request can affect performance and might be the way to go. Especially consider the flushing rules for Hibernate. Making sure you read the information at the start of your transaction can reduce the number of times Hibernate writes to the database.
Write cache your objects. This can become quite complex because of synchronization issues. If you’re running more nodes (most projects are), you’ll need to set up synchronization between your nodes and caches. This reduces scalability and complicates setup and deployment. Keep it simple.

Let me know if there’s something I’ve forgotten, I’m still learning. 🙂

Some resources:

I read this post about Clover and using it to minimise the number of tests run. A nice idea, so I decided to have a go at it.

What it does is use the test-coverage that it was originally written to do, to figure out which tests exercise which classes. So when you change Class1 and Class2 it knows which that Test1, Test2, Test3 and Test4 need to be rerun to check if you have broken anything.

I did an initial test on our build which takes almost 10 minutes. I’m of the slightly paranoid type so I like to run the full build to verify my changes, at least before I check in. With changes in one class Clover figured it should rerun about 20 tests, and ran in about 4 minutes. That’s 6 minutes saved many times a day for each developer. We’re not using it at our build server just yet, but trying to save time for each developer before commit.

Any downsides? Of course. The initial time (mvn clean removes all info) to build is almost doubled for my project. You won’t catch all errors, and updates to dependencies won’t be caught. And there is a problem handling deleted classes. For some reason it creates a optimized-src directory which is a copy of all your sources and compiles from there. If you delete a class in your src folder it won’t be deleted in optimized-src and you could get compilation errors. After some initial tests it seems these problems are bigger in theory than they are in real day situations.

Using Clover should be no excuse bad tests though. I know all too well how I can really mess up my own tests. The long test times often stem from our inability to focus on testing the logic separately from infrastructure as databases or external services. And of course loading the Spring context is done way too much. But that’s stuff for another post.

But even if you have good unit tests there will be time to save. And everything that can keep me from getting distracted when developing is a good thing.

I’ll give this a good run until the 30 day trial licence expires, and maybe invest. Maybe we will see something similar in Cobertura too…

Update: It looks like it is also activated when doing mvn eclipse:eclipse . Because it redefines the source folders to target/clover/src-optimized this is also what is written in you Eclipse project. So be sure to have an easy way to disable running if you’re going to use this.

Share this:

Share this: