Categories
Architecture Development Tech

Time to re-evaluate my relationship to databases

Pun totally intended. 😜

TLDR; The database landscape often presents a false choice: you either use heavily normalized relational databases or document databases. Still, there is a more nuanced middle ground that can offer the best of both worlds. Or as Dan North calls it: Best Simple System for Now.

Recent discussions has made me realise that the way I do databases is unconventional to some. So I thought I would describe some techniques and assumptions I make.

Using the right technique for a specific situation can make writing applications simpler and faster. It also makes solving business needs easier. Reducing complexity and simplifying development. 🚀 Read on for the techniques.

Just Use Postgres for Everything offers some nice pointers. It shows how you can solve some specific problems in less conventional ways.

Read on…

First: Databases are fast

Modern databases are incredibly efficient – performance bottlenecks typically stem from application code rather than database limitations. Before adding layers of complexity with services and queues, evaluate whether you’re using your database to its full potential.

We would all love to be Facebook or Netflix, but we’re not. So until then the DB will usually cover your needs. You just have to find better ways to use it for your specific use cases.

Analyze, refactor and tune. Reduce the number of queries, optimize and batch execution. Follow the techniques below to make your chance of scaling far even better.

Use UUIDs as primary keys

UUIDs provides end-to-end consistency in your data lifecycle. You can generate them at any point. This can be in your frontend client or backend service. As they maintain global uniqueness, they simplify distributed systems and helps prevent subtle data integrity issues.

If you need to keep track of which ones are persisted or not, you can model that explicitly in your domain (persisted=true). Usually not necessary. Don’t rely on the existence of a ID for knowing it’s persisted state.

Because UUIDs are globally unique, they can help you avoid subtle consistency issues. I have experienced first hand that when we switched to UUIDs, we quickly uncovered a flaw using the wrong foreign keys. As they were not the same across tables, the foreign key constraint kicked in.

If you Google it, you will find warnings about it not being performant, but I never experienced any issues. If it becomes a problem, you can use better UUIDs. Since Postgres 18 this comes built in. Or fix it with regular primary keys in specific places where the problem arises.

Think in aggregates

Think in aggregates. Something like a vehicle and all the data that only lives in the context of one specific vehicle. You can store lots of data on the same table with JSON columns. Don’t do eager normalisation.

The status history can be represented as a JSON array.

  • A transport table can have an address object as JSON in one column.
  • Two orders with different addresses (not normalised out into a table) for the same person, can be correct. They are likely historical records of the order. Not the person.

Having fewer tables and fetching the whole aggregate at once has several advantages. It eliminates many joins, eases binding from SQL to objects, and it reduces the number of queries you have to write.

You will have to learn ways to query JSON in your (preferred) DB, but it is worth it. They are surprisingly performant. At least in Postgres.

Migrations can be a bit more painful, but you can do it. I believe in you. 😉

Don’t use Object-Relational Mappers

While ORMs promise developer productivity, they often introduce more complexity than the benefits they deliver. Your team needs to deeply understand ORM internals. This includes lazy loading, flushing behaviour, and caching strategies. Without this knowledge, you will find yourself fighting the framework more than leveraging it. Take it from someone who has debugged them to death.

Flushing happens when you least expect it. IDs are fetched one at a time when you insert your gigantic collection. Everything is fetched without you understanding why and when. There is simply too much magic.

I know, I know, it seems scary to write “all that SQL”. But follow some of the other recommendations and there will not be that much. 🙂

Dedicated queries are (mostly) good

Do not obsess over having just one query for one thing. Like fetching the vehicles. Don’t go overboard, but specific queries are easier to fix when there is a performance problem. You can share common things, like the name of the fields in the query through code (just have a const?).

I would probably work harder to make INSERT and UPDATE be common, than SELECT. Your mileage may vary. 🤷‍♂️

Design, then monitor

Design with the least overhead, make it easy to map into the DB, and think in aggregates. Then monitor the performance.

There are tools to monitor on the DB side. However, I prefer the ones that provide deep insights on the application side. Tying endpoints and code to DB operations. When you tie things together like that, you can easily see patterns. One endpoint often calls the same DB query hundreds of times per request. It might be fast as hell, but it shouldn’t be called that many times.

Use EXPLAIN to see what really is slow in your query, and a proper APM tool. I like New Relic, but you should at least use an Opentelemetry Agent.

Change is (can be) easy

Many approaches to databases stem from the belief, ‘let’s get this right, cause change is hard.’ And changing databases is hard(er) some times. But like anything else: practice makes perfect.

If you start out by avoiding changes, you won’t be ready to perform the big change when you really need to. So keep on doing changes, keep them small and learn how to evolve your database. The small changes will prepare you for the big change one day. And it has a funny way of helping you think in small steps, so most of the big ones goes away.

Add as needed, to get the practice that makes any change just another small daily step.

Use migrations. Flyway is nice if you’re in Java land.

Some givens

I realise I still take some things for given. Sorry about that, but I can’t describe everything here. So just to be explicit, are some things that come to mind. Maybe I will discuss these in depth later:

  • Know your transaction boundaries, and manage transactions with errors.
  • Handle concurrent updates, they are usually less if you think in aggregates. Use optimistic locking if you have to.
  • Use connection pooling. New connections are really slow.

Other interesting topics to explore

In the end…

You want to deliver functionality with minimal complexity and cost. There is no silver bullet, but these techniques usually pays off. At least think of them as defaults, and do something different when the real need arises.

I like them because moving away from them usually isn’t that costly when you have to. Change to a different type of storage for parts of your domain if you have to.

Most of this is not revolutionary. However, I see time and time again that some of these solutions are new to developers.

I hope you discovered at least one new thing that you can try out. And let me know what you disagree with. Or how you do things differently.

Oh… Remember to backup your database. And protect the backups. I learned this the hard way. 😉

Thanks to Øyvind Asbjørnsen, Tore Engvig and Trond Marius Øvstetun for feedback.

Categories
Development

JUnit and ParameterResolver — Caching database connections in your tests

This is a re-post of an original Medium article. As I am moving my content here I will re-post some content.


Car speeding on the road

We aim for fast tests, ideally completing all tests within 30 seconds. Currently, our tests take 1 minute and 30 seconds, but we are determined to to get ther. 😃 To achieve this goal, we must reduce the overhead of each run.

Read on to learn about the techniques we use to speed up DB connection handling and migrations.


We aim to minimize the number of database tests we write by using fakes, but we still require some tests to verify our DB layer. We just don’t want the majority of our tests slowed with network access (even locally in Docker).

Establishing connections takes time, and the overhead of checking migrations before each test can also be time-consuming. We usually run a persistent DB in a Docker container, so we only have to create connections and migrate once for each run. We do use Test Containers if no DB is available, but it is slower. Every millisecond counts. 😃

Therefore we looked for a way to minimize this when running thousands of tests.

JUnit enforces strict isolation between tests so you can’t just inherit a class or something and get a shared value across the run. But as we knew Spring caches contexts across tests, there had to be a way. JUnit ParameterResolvers come to the rescue:

class DatabaseTestExtension : ParameterResolver {
private val STORE_NAME = "main-database"
override fun supportsParameter(parameterContext: ParameterContext, extensionContext: ExtensionContext?): Boolean {
return parameterContext.parameter.type == Database::class.java
}
override fun resolveParameter(parameterContext: ParameterContext, extensionContext: ExtensionContext): Any {
// We do the store thing here to avoid loading and migrating the DB for each test/class
// Will however load per thread, so are not guaranteed to be done only once
val store = extensionContext.root.getStore(Namespace.create(DatabaseTestExtension::class.java.simpleName))
val db: Database = (store.get(STORE_NAME) as Database?) ?: Database(Config.load()).also {
// New object so do initialization and store
it.initializeAndMigrate()
store.put(STORE_NAME, it)
}
return db
}
}

The Database and Config objects are just custom wrappers around HikariJDBI and Liquibase. You can store the JDBI object or a Hikari connection pool directly by changing the code above and adjustinng the class type. The important part is putting it in the store so it is persisted across runs.

To use it you do something like this:

@Test
@ExtendWith(DatabaseTestExtension::class)
fun testSomething(db: Database) {
...
}

Pro tip: You can add the ExtendWith annotation to the entire test class, not just the test method.

And just like that you have a resource that won’t take extra time/load to run when running all your tests. 😃


Subscribe for further updates 🙂

Categories
Development

Java migrations tools

Wow, it’s been a while. If you’re interested in good links follow me on Twitter: http://twitter.com/anderssv . I usually update there these days.

My talk on “Agile deployment” got accepted for JavaZone this year! I’m extremely happy, but a bit scared too. 🙂 I’ll be talking about rolling out changes in a controlled manner, and one of the things that are usually neglected in this scenario is the database side. I’ll cover stuff like packaging and deploy of the application too, but that’s probably the area where I know the least. The database side of things are really sort of my expertise.

I have written some blog posts on this already, and in relation to the talk and things at work I did a quick search for Java migration tools. DBDeploy I have used earlier, but there are now a couple of other contenders. Here’s my list so far of tools that work on sql deltas that can be checked into SCM:

  • DBDeploy – Tried, few features but works well. Ant based.
  • DbMaintain – Probably has the most features. Ant based.
  • c5-db-migration – Interesting alternative, similar to DBDeploy. Maven based.
  • scala-migrations – Based on the Ruby on Rails migrations. Interesting take.
  • migrate4j – Similar to Scala Migrations, but implemented in Java.
  • Bering – Similar to Scala Migrations, and looks a lot like Migrate4J

I’ll definitely be looking into DbMaintain and c5-db-migration soon. DbMaintain looks promising, or I migh just contribute to DBDeploy some features. I’ll let you know how it went. 🙂

(updated with scala-migrations, bering and migrate4j after first post)