Java migrations tools

Wow, it’s been a while. If you’re interested in good links follow me on Twitter: . I usually update there these days.

My talk on “Agile deployment” got accepted for JavaZone this year! I’m extremely happy, but a bit scared too. 🙂 I’ll be talking about rolling out changes in a controlled manner, and one of the things that are usually neglected in this scenario is the database side. I’ll cover stuff like packaging and deploy of the application too, but that’s probably the area where I know the least. The database side of things are really sort of my expertise.

I have written some blog posts on this already, and in relation to the talk and things at work I did a quick search for Java migration tools. DBDeploy I have used earlier, but there are now a couple of other contenders. Here’s my list so far of tools that work on sql deltas that can be checked into SCM:

  • DBDeploy – Tried, few features but works well. Ant based.
  • DbMaintain – Probably has the most features. Ant based.
  • c5-db-migration – Interesting alternative, similar to DBDeploy. Maven based.
  • scala-migrations – Based on the Ruby on Rails migrations. Interesting take.
  • migrate4j – Similar to Scala Migrations, but implemented in Java.
  • Bering – Similar to Scala Migrations, and looks a lot like Migrate4J

I’ll definitely be looking into DbMaintain and c5-db-migration soon. DbMaintain looks promising, or I migh just contribute to DBDeploy some features. I’ll let you know how it went. 🙂

(updated with scala-migrations, bering and migrate4j after first post)


The new guy and his database

This is a followup to two previous articles about Agile databases and Migrations for Java. It tries to examplify some of the stuff I talk about in those two articles. Here we go again… 🙂

You won’t get a new developer each week, but the scenario will help illustrate how the tools I have been talking about works. The examples below are loosely based on the previous setup we had in a previous project, but should be general enough to give you an idea. This basically means SubVersion, Maven, Oracle and Ant.

So you have a new developer. Let’s call him John. He’s quite nervous the first day of work, and you wan’t to pair program with him to get him right into coding. Sure he could read the architecture documents, but you will touch upon most of the architecture by working together, so you’d rather just get started.

You sit down with the new developer, and tell him to check out the project from SubVersion.

svn co http://companyrepo/project/trunk project-trunk

The checkout pulls down several Maven projects, with a common parent POM. First things first, so you compile everything and make sure he can use it in Eclipse.

mvn install eclipse:eclipse -DdownloadSources=true

Depending on your location downloading dependencies can take a while. So showing him the coffe machine would probably be a good thing right about now. Everything compiles, and he imports it all into Eclipse.

Now he is eager to have a look at the application, but like most applications your application needs a database. You could probably have settled for HsqlDB or H2 in a test setting, but I prefer to do manual testing on the product that we are actually going to deploy to in production.

So you need to initialize a database. One of the sub-projects you checked out earlier is actually a separate database project. Inside this project is a folder where the scripts for the database resides. On your wiki he finds a description on how to initialize a new database. From the base project he does:

  1. cd dbproject/src/sql/baseline
  2. sqlplus sysadm/syspw@//db:1521/service @create_new_schema.sql johnuser testpw
  3. sqlplus johnuser/testpw@//db:1521/service @baseline_data.sql
  4. sqlplus johnuser/testpw@//db:1521/service @test_data.sql
  5. cd ../../ (takes him to the dbproject folder)
  6. ant dbdeploy-upgrade -Ddb.user=johnuser -Ddb.service=service

Now John has a fully functional database that he can use as his local sandbox for development. I guess a little bit of explaining is in order. In the lines above with sqlplus commands, the first parameter is connection settings. The second parameter is the script to execute, and everything after that are parameters to the script. On line 2 the script uses the inputs to create a user and schema called johnuser and with the testpw password. It also creates tables, triggers, functions etc.

After creating the complete schema in line 2, the baseline data is inserted. I am not sure if this is a good term, but by baseline data I mean data that needs to be there for the system to operate, and don’t change during normal processing. You might have a admin interface to change it, but for most of the time they stay the same. This could be tables holding countries or postal codes.

After inserting some baseline data, it is time to insert some test data in step 3, such that John has something to experiment with right out of the box. This is separated in a script because not all environments will need those data.

The last step is done to upgrade the database to the latest version. See the migrations article for an explanation of the concept. This means that the baseline script is not updated with changes all the time. Every now and again we generate a new baseline from the production database, so we can delete some of the old migrations. Generating a new baseline is something I havn’t really found a good tool for yet, so I get the DBA to do it with some of his tools. It happens rarely enough that for now, I accept that it’s not automated.

That really is the last part of my database articles for now. It is a topic I will probably write more about later as it is something that has been handled poorly in most projects I haven been in. It is also an important part of what I like to call agile deployment that helps us reduce the time spent on deploying, and fixing all those pesky little errors we do when deploying.


Migrations for Java

If you are familiar with Ruby on Rails you know what Migrations are. The same thing can and should be done in Java, it’s just not that well known.

Why migrations? Because it enables you to automatically update any environment you have to the latest version. And this is done through source control closely tied to the code. This means that every time a developer checks in a change to the database it gets propagated to all the developer sandboxes. The person responsible for deploying into the test environment doesn’t have to know which changes are to be applied or not. This is an automatic process that has been tested by CI already. But more on this later.

Simple migration

For those of you that don’t know what migrations are, a migration is a change set. An example would be a change that adds a column to a table. Writing a migration is then writing eirther code or SQL that represents an alter table add column statement. You do not update the original create table statement. For RoR this means some Ruby code, and with the current tools in Java it means writing SQL. I’m not quite sure which one of them I actually prefer, but when making changes to tables with millions of rows I prefer to have as much control of the SQL as possible. The rest of this entry will be focused on the Java way with writing SQL.

So when I wan’t to make a change to the database I would create  a new file calles something like 0325_add_comment_column_to_user.sql :


You might notice that this is not portable. VARCHAR2 is an Oracle specific thing. How often do you change database implementations? Not very often, and your code is still vendor independent since you’re using iBatis or Hibernate. Also, during a change of databases you need to port the schema, which will then also include this change because you will port the base line.

Base line

Base line you say? For these migrations to work you will need something that is already there. You obviously can’t do a alter table without a create table first. The create table migration is a bit back in time, but it has been done. So if everything that is done in your database is migrations in separate files you will get a lot of scripts after some time. To prevent this; at regular intervals generate a new script containing the schema that represents the current state of your database. This is from production, or some “production like” environment. Then you can delete all the scripts that lead to that base line, and keep adding new scripts.

This also makes it easier to create new instances of your database. You first run the schema from the base line. In this base line it is included everything up until revision 307. So when you have done this, you need to apply the latest migrations that are missing up until 325 which we created above.

The harder migrations

One important concept with migrations is that you can’t just think about the schema. You will also need to take into consideration the data in tables. This wasn’t really that hard when adding a column that can be null. If we were adding a column that had a ‘not null’ constraint we would need to figure out what to fill in. Maybe this was a default, or maybe you’ would have to calculate the new values. To achieve this you must write PL/SQL or something similar, which would scare a lot of developers off. I personally think that we should all know enough database stuff to be able to do this. It is not just useful in creating migrations, but for gaining a better understanding of databases and how we can use them effectively. I wish I could show you a harder migration here, but I actually don’t have anything to experiment on at hand, so I just hope you get the picture.

The tools

We are using a simple tool called DBDeploy. It’s a very simple tool that requires a table in the database. This table holds which revision has already been applied, and it uses this information to find out which numbered scripts on the disk is to be applied. In the case of the base line above that would be scripts 308 through 325. It does no processing of the scripts, and just puts it together in a new file with some updates to the changelog table in between. This means that you can pretty much write whatever you want inside these scripts. The test to see if it is working is by running the resulting file. Because DBDeploy actually doesn’t execute anything, we have made a simple ant integration with sqlplus for executing the statements. We did use to call the sql task in Ant, but found that we needed some of the flexibility that sqlplus did offer us to be able to perform these migrations in a good way.

We are also using Maven to package and release these scripts in the same manner as we release code. That way we can always go back to see what DB state equals the code, and we can pull down official releases from the Maven repository. It also integrates nicely into a Continious Integration scheme. Every time someone checks in a change it is automatically tested against a separate database. This way we catch most errors instantly and can fix them while we still remember what we were doing.

Some special cases

Simple is not easy. So there are some things that you need to be aware of, and some things that are not really handled that well. I do however belive that it is worth it regardless. Some of the things we have discovered:

  • You can not change a migration after it is checked in to source control – The script has already been applied to a database and will not be run again no matter how many changes you do to the script after first check in. There are some exceptions to this, but the basic rule is that it can not be changed if the first script was succesful. You might need to write an alter table statement for the alter table you just did.
  • Branches – Since the scripts are number based you there will be a problem when merging a branch into your trunk. What if both trunk and branch has a number 62, but with different contents? First of all, DBDeploy will fail because it finds two scripts with the same number. So you’ll need to rename the script that you just merged in with a different number. But you can’t give it the last number, since some of the other scripts without conflicts from the branch might be dependent on the one with a conflict. So you should always make sure the change in the branch uses a number higher than the one in trunk, and merge that as fast as possible down into trunk so anyone else doesn’t use that number. Not very good, I admit, but it’s the best solution so far.
  • Synonyms – Synonyms will be dependent on the schema names. This is not something you must do with DBDeploy, but we prefer to keep our scripts independent on user names. This enables us to have several instances for different purposes in the same environment. We have actually added a Ant search and replace to enable us to write scripts with synonyms, but replace with the correct db schema at runtime.

The stuff I haven mentioned here is very much still a work in progress, and my goal is that we will be able to change our database as we need instead of putting it off until it becomes a separate migration/rewrite project. We need to be able to make improvements with as little hassle as possible so out systems won’t rot away.