Category Archives: Java

Write awesome CLIs!

It’s time to start writing kick ass CLIs instead of hacking scripts! 🙂 It’s a lot easier than you might think.

If you’re impatient just scroll to the bottom for a link to the code in Github. 🙂

All those scripts

I see a lot of scripts around, but they usually suffer from many of these problems:

  • Missing or bad error handling
  • Limited input validation
  • Clumsy parameter handling
  • No testing, so every change requires testing all input combinations. Not to mention different state on the hard drive.
  • Copy and paste code. It’s hard to re-use libraries in scripts, even though a lot exists.
  • Implicit dependencies to OS and OS packages

I’ve done way too much of this in my time, and I have felt the pain of maintaining 16k lines of Bash code (I know, stupid). So I started looking for something better…

What I wanted

Coming from the developer side of things I’m really used to making third party libraries do a lot of the heavy lifting for me. It felt really awkward that there were no proper way to do this when creating tools for the command line. So I set out to look for:

  • A good way to define the Command Line Interface
  • Proper error handling
  • Test frameworks to enable automated testing
  • A way to package everything together with dependencies

In addition; I really wanted to do some automated testing. I hate writing code without knowing instantly that it performs as I believe it does. You might be differently inclined. 😉


There are many ways of doing this, but the only ones I’ve been able to get some real experience with are Python and Java. I would really like to learn Go, but it’s usually not politically viable and would take some time to learn.

I did maintain and develop a CLI in Python for a good while. And I really like the Python  language and all the awesome third party libraries available. But I always found it lacking in the distribution part. We were under certain (networking) constraints, so downloading stuff from PyPi was NOT and option. It took quite a lot of hacking with Virtualenv and Pip to set up some kind of infrastructure that enabled us to distribute our CLI with it’s dependencies. YMMV. 🙂

But all these hoops we were jumping through with Python made me think about what is great about Java. The classpath. 😉 Yeah, I know, I know. You all hate the classpath. But that’s because it’s been abused by the Java EE vendors through all these years. It’s really quite awesome, just make sure you take full control of it.

Java with some help from friends (see the details further down) would let me package it all up and create a truly cross platform single binary with all dependencies included (JRE required)! It even starts fast! (Unless you overload it with all kinds of Spring+Hibernate stuff. That’s on you.). And even though it sounds like something a masochist would do; it is actually kick ass. Try it. 🙂

If you don’t do this in Java; use Docopt (available in many languages). You should write CLIs and keep your build tool simple (dependencies, versioning, packaging).  I’ve seen way too much tooling shoe horned into different build tools. Write CLIs for the stuff not related to building and use the right tool for the right task.

Test a Java CLI

If you just want to try how fast and easy it actually works (you’ll need a JRE on your path):

$ curl > ~/bin/json-util && chmod u+x ~/bin/json-util
$ json-util
  json-util animate me
  json-util say [--encrypt=] 

In case you did not catch that:

  • We downloaded a jar, and saved it as a regular binary and saved to ~/bin.
  • We executed it and didn’t give any parameters so it printed the help text.

It’s a really simple (and stupid) example, but to invoke some “real” functionality you can do:

$ json-util say "Hello blog!"
Hello blog!

$ json-util --encrypt=rot13 "Hello blog!"
Uryyb oybt!

Neat! Write the utils you need to be effective in a language you know, with the tooling you know (this util is created with Maven). And write some F-ing tests while you’re at it. 😉

Tell me more, tell me more…

The things that makes writing CLIs in Java fun, easy and robust is:

  • Java. 🙂 Alright, alright. Maybe not the best language for this stuff. But the new IO APIs and the Streams with Lambdas in Java 8 helps a lot. And it’s typed… If you’re into that kind of stuff. 🙂 You can of course do this in Groovy or anything else that runs on the JVM, but be aware that many of those languages takes some time to bootstrap and you’ll notice that every time you run the CLI.
  • Maven-shade-plugin. It packages your code together with all the dependencies to one binary.
  • Maven-really-executable-jars-plugin. It modifies the single jar with a zip-compliant header that lets you skip the “java -jar …” part of executing it every time.
  • Docopt-java. It makes writing, validating and parsing command line arguments extremely easy and fun.
  • Docopt-completion. Once you have your kick ass CLI, add some kick ass tab-completion. 😉

Show me code!

You can see an example of all of this (Java and Maven required) at: .

Hibernate performance and optimization

Since I started working way back in the summer of 2004 I’ve been working with ORM technologies. First TopLink, and then Hibernate. I like ORM technology, but I also see the associated problems. They are incredibly complex products, and not having some resources that really understands what is happening will get you into serious trouble. Usually the problems surface with performance,  other times, just strange results occur. Even if you are lucky enough to have experts at hand, the general understanding amongst your developers continues to be a problem.

I would love to have an easier alternative, but I don’t really see anything on the horizon that replaces it without removing too many of the benefits. But that’s really something for a different post.

Some basics

So if you’re doing ORM, and especially Hibernate there are some basics to keep your options open when it comes to performance:

  • Don’t do explicit flushing
  • Don’t disable lazy loading in your mappings
  • Don’t use Session.clear()

There are valid reasons for doing these things, but usually they are used because someone does not quite understand how Hibernate really works. And what’s even worse; they limit your choices later on when tuning and changing the solution. So if someone are experiencing problems and consider doing one of these things; make sure they talk to your main Hibernate guy first. I can guarantee you there is a lot of pain involved in removing these later on.

So if you’ve been a good boy or girl and avoided these pitfalls you should be set to do some performance optimisations in the areas reported as slow. Yeah, reported as slow. Don’t do any funky stuff before you know it is an area that needs improvement. Doing special tuning will limit your options later on, so only do it where it’s really necessary.

Diagnosing the problem and finding the culprit

YourKit Java Profiler is my absolute favourite for diagnosing problems related to Hibernate. It enables you to see all queries executed, and trace it back into the code to figure out why it is run.

Trond Isaksen from Zenior also held a talk at Capgemini last week where he talked about using stacktraces in core dumps for analyzing problems. It might actually be your only option, because introducing Yourkit in production will cause side effects.

The amount of information can become quite overwhelming, but learn these tools and you will have a trusty companion for diagnosing your database performance in Java for a long time to come.

Hibernate performance

The way Hibernate basically promises to deliver performance is through caching and changing the amount of data that is fetched each time. This works for most of the cases where you use Hibernate. If not, you might just need to do some good old SQL.

Understand 1st and 2nd level caching and figure out how you can tweak relations to change behaviour and you have a good tool set to tune Hibernate with.

Fixing issues

Once you find the reason for your performance problem, and if it’s database or Hibernate related, you basically have the following options. I try to follow this list top to bottom. The last solutions impact your code more and can also complicate your deployment and infrastructure.

  1. Check that your database indices are tuned. To have effective fetches your indices must match your actual queries, so make sure they are correct.
  2. Consider how you do key generation. If inserts are slow, it might be because it calls the Database for a key for each and every row it intends to insert. Change generators or assign yourself. Stuff like thr Sequence HiLo Generator can drastically reduce the number of queries Hibernate does to your database.
  3. Fetch on primary key whenever possible. Hibernate has some default methods called get/load that lets you retrieve an object based on the primary key. These methods checks with the first level cache whether the object has been retrieved within the same Hibernate session, and if so avoids database communication. So if you use these you will only get one call to the database, even though your code actually calls Hibernate multiple times. Using Queries will bypass this mechanism, even though you query on primary key.
  4. Enable second level cache for Read Only entities. This is a really good quick win for stuff like Currency or Country. Close to zero cost.
  5. Consider wether you are always using a set of objects at the same time. You rarely retreive an Order without looking at the underlying Items in the Order. Setting fetch=”join|select|subselect” or batch size on the relation can increase the speed. Note that this will then happen every time you fetch the Order. It will also effectively bypass any caches you have enabled, so make sure you consider all the usage scenarios for this.
  6. Write custom queries for the situation. Setting the fetch mode in an association as in the previous section will impact every fetch of the Order object. If there’s only a few cases where the performance gets really bad and that is separated from other parts of the system, you can write a custom query. This enable you to tune the fetching to the concrete case and let other parts of the system actually benefit from lazy loading. This is preferably custom Hibernate Criteria, but can also be HQL or even SQL.
  7. Use plain old SQL. There is actually things that SQL is better at. Use it, and use something like the RowMapper feature in Spring with it.
  8. Refactor your code to enable better performance. Changes in the model, or the design of services and request can affect performance and might be the way to go. Especially consider the flushing rules for Hibernate. Making sure you read the information at the start of your transaction can reduce the number of times Hibernate writes to the database.
  9. Write cache your objects. This can become quite complex because of synchronization issues. If you’re running more nodes (most projects are), you’ll need to set up synchronization between your nodes and caches. This reduces scalability and complicates setup and deployment. Keep it simple.

Let me know if there’s something I’ve forgotten, I’m still learning. 🙂

Some resources:

Smidig 2009 and my talk

Smidig 2009, Norways very own Agile conference was held on October 22nd and 23rd. I have attended it earlier, and this year I was one of the organizers. The entire experience has been excellent, though a lot of work. Others were working a lot more than me, and I reall have to give them credit for beeing such a positive and active crowd.

Tandberg did an excellent effort of providing us with the video equipment, and enabled us to make the videos available. Even though sometechnical difficulties resulted in loosing some talks (mine included), they did an excellent job, and a big thank you to them.

I held a talk on agile deployment (again), due to some last minute cancellations.

Because my talk was lost I decided to put some details here on how you could get the material if you are interested. So:

My own company, Capgemini had 7 talks, which was a really good effort. To see other videos (Norwegian only) go to .

My Javazone talks

My JavaZone talks are available. Sorry, only in Norwegian. I’ve scanned through them and I’m fairly satisfied with the performance. Looks like the Rules engine talk doesn’t have video of us up on the stage. Not sure what happened.

I’ll have to go through them for a little retrospective later. Let me know if you have some feedback.

Here they are:

JavaZone 2009 over

So the big event for Java geeks in Oslo, JavaZone, is over. I had a blast as always, and a little less people. Moving the overflow area helped a lot. The less people was a conscious choice I’ve been told, and I wouldn’t really mind even fewer. But hey, I guess there’s some economics that has to work too.

My company, Capgemini was present as usual, and the bright girls and guys did an excellent job of tweeting and blogging from the conference. If you know norwegian check out our twitter stream and the technology blog.

We also had two full feature talks, and I was really satisfied with how they went. Always something to change, but over all very satisfied. The topics were (sorry only Norwegian slides):

  • Smidig Utrulling (Agile deployment) – Slides
  • Rules engine vs. Domain logic – Slides

For the Smidig Utrulling talk I spent a lot of time creating code and examples. They contain some simplified Maven setup and a Java program to do deployment from the Maven repo. Check it out at . Documentation is scarce as always, but let me know if I can make something better. And feel free to use.

The videos will be available later, but if you want to see the ones that are available (of others, some english speakers) you can check them out at . Tandberg delivered the video equipment and it seems like they are doing a hell of a job for the community. Check out the videos, there’s a lot of good talks there.

Now for some vacation, see you. 🙂

Java migrations tools

Wow, it’s been a while. If you’re interested in good links follow me on Twitter: . I usually update there these days.

My talk on “Agile deployment” got accepted for JavaZone this year! I’m extremely happy, but a bit scared too. 🙂 I’ll be talking about rolling out changes in a controlled manner, and one of the things that are usually neglected in this scenario is the database side. I’ll cover stuff like packaging and deploy of the application too, but that’s probably the area where I know the least. The database side of things are really sort of my expertise.

I have written some blog posts on this already, and in relation to the talk and things at work I did a quick search for Java migration tools. DBDeploy I have used earlier, but there are now a couple of other contenders. Here’s my list so far of tools that work on sql deltas that can be checked into SCM:

  • DBDeploy – Tried, few features but works well. Ant based.
  • DbMaintain – Probably has the most features. Ant based.
  • c5-db-migration – Interesting alternative, similar to DBDeploy. Maven based.
  • scala-migrations – Based on the Ruby on Rails migrations. Interesting take.
  • migrate4j – Similar to Scala Migrations, but implemented in Java.
  • Bering – Similar to Scala Migrations, and looks a lot like Migrate4J

I’ll definitely be looking into DbMaintain and c5-db-migration soon. DbMaintain looks promising, or I migh just contribute to DBDeploy some features. I’ll let you know how it went. 🙂

(updated with scala-migrations, bering and migrate4j after first post)

Agile deployment talk retro

On wednesday I did a talk at the Norwegian Java User Group about agile deployment. The slides (in norwegian) are available here as well as embedded on the bottom of this post.

From the comments and questions I got afterwards, I could see how I should have included more detail. That would have made it even more interesting for that kind of crowd. I probably also should have clearified that I had limited time to prepare and that this was just a slightly extended version of a lightning talk I held at XP Meetup last year. I hope to get the chance to correct this in a JavaZone talk with more details. If you did see the talk and have comments please do leave them at the bottom of the page. 🙂

Many of the questions I got revolved around the handling of the database, so I just thought I should give some pointers here to articles that better describes what I have been up to:

Check them out if you’re curious.

SOA Myths

I did tweet this, but it was just so spot on I had to put it up here. You can read the full 7 SOA myths article here. Some highlights:

  • SOA is a way of thinking about integration, not a product.
  • Semantic coupling will always be difficult regardless of the integration technology you use.
  • Discover the services through use of the systems, not by up front design.
  • SOA is a continuous process that will never get done.

Repository pattern and the domain

Ever since I read the Domain Driven Design book by Eric Evans I have been wondering a bit about the Repository pattern. Especially the part where a domain object can communicate with a repository. In most Java systems today you will have Hibernate with lazy loading which enables you to think in domain objects, but there might be some situations where you can’t do that. That’s when you need to think about the Repository pattern and how to use it. Reasons for using it might be:

  • Large collections, you need to filter before loading for current operation
  • External data, it is only available through calling a webservice etc. meaning Hibernate won’t handle the lazy loading for you

So basically you will need to do this in most systems some time, how do you solve this? I had a discussion about this with Johannes a while back, and we didn’t really reach a conclusion. There are several ways to handle this and they all have their benefits and problems. Terminology and some more ideas borrowed from Ben Hutchinson in this post.

Independent domain

The domain has no notion about repositories, but navigate through properties in the domain. This can be achieved through:

  1. Always passing enough data into methods so the object can make it’s decisions. This can bloat the signature and be a bit of a hassle, but probably the way to go with least magic.
  2. AOP on the methods that retrieve the data. This enables you to have just plain Java for tests, but enable a different retrieval at runtime. Complicates the understanding and testing of the system, but eases unit testing.
  3. Events in your persistence layer. When Hibernate loads up an object for you it could inject the external object. Sort of  light weight AOP, but not something I think I would do.

Co-dependent domain

The domain knows that it must retrieve information through a service/repository and calls it when it’s needs the information. It can be achieved through:

  1. Passing the service/repository in with the method call. Again bloats the methods, but is explicit.
  2. Using a Locator to get the service/repository. The dreaded ServiceLocator from Java EE is back! 😉 But it might have a worse reputation than it deserves. A Locator that checks the ThreadLocal and knows how to retrieve the repository is quite flexible even in Tests. Not very expressive on how and where the information can be found.
  3. Passing the context in the method calls. Just about the same as passing the service/repository, just another level of indirection that I don’t like.
  4. Injecting the repository/service into the object before execution. Somewhat like a middle ground between the Locator and passing it in as an argument.


It’s like everything else in computer science, you need to decide what works best in your project. I do however prefer the Independent Domain with passing in the required data in the method. I guess the reason for this is the uttrely horrible code I have seen before where domain and repositories/services are mixed together. A lot of programmers will tend to create integration tests when faced with testing domains that has dependencies to repositories. In many cases that means loading your Spring context waaaay too often, or mocking the repository. And neither are really good unit tests, as well as beeing harder to maintain. I hate mocking even though I have to do it often. I just try to avoid it if I can by designing logic that is independent of infrastructure.

Other references

DDD and generic repositories

Reuse is good. That’s what we learn, so we all strive to reuse code, but sadly sometimes we over generalize and try to be too clever. I have had the suspicion that creating generic repositories that takes criteria/queries as input parameters is one such over generalization. Some things, in fact a lot of things, are worth making explicit.

As Greg Young points out in this excellent article, this over generalization can even hurt you when it comes to tackling performance at a later stage. It’s well worth the read. 🙂