Categories
Architecture Development Tech

Time to re-evaluate my relationship to databases

Pun totally intended. 😜

TLDR; The database landscape often presents a false choice: you either use heavily normalized relational databases or document databases. Still, there is a more nuanced middle ground that can offer the best of both worlds. Or as Dan North calls it: Best Simple System for Now.

Recent discussions has made me realise that the way I do databases is unconventional to some. So I thought I would describe some techniques and assumptions I make.

Using the right technique for a specific situation can make writing applications simpler and faster. It also makes solving business needs easier. Reducing complexity and simplifying development. 🚀 Read on for the techniques.

Just Use Postgres for Everything offers some nice pointers. It shows how you can solve some specific problems in less conventional ways.

Read on…

First: Databases are fast

Modern databases are incredibly efficient – performance bottlenecks typically stem from application code rather than database limitations. Before adding layers of complexity with services and queues, evaluate whether you’re using your database to its full potential.

We would all love to be Facebook or Netflix, but we’re not. So until then the DB will usually cover your needs. You just have to find better ways to use it for your specific use cases.

Analyze, refactor and tune. Reduce the number of queries, optimize and batch execution. Follow the techniques below to make your chance of scaling far even better.

Use UUIDs as primary keys

UUIDs provides end-to-end consistency in your data lifecycle. You can generate them at any point. This can be in your frontend client or backend service. As they maintain global uniqueness, they simplify distributed systems and helps prevent subtle data integrity issues.

If you need to keep track of which ones are persisted or not, you can model that explicitly in your domain (persisted=true). Usually not necessary. Don’t rely on the existence of a ID for knowing it’s persisted state.

Because UUIDs are globally unique, they can help you avoid subtle consistency issues. I have experienced first hand that when we switched to UUIDs, we quickly uncovered a flaw using the wrong foreign keys. As they were not the same across tables, the foreign key constraint kicked in.

If you Google it, you will find warnings about it not being performant, but I never experienced any issues. If it becomes a problem, you can use better UUIDs. Since Postgres 18 this comes built in. Or fix it with regular primary keys in specific places where the problem arises.

Think in aggregates

Think in aggregates. Something like a vehicle and all the data that only lives in the context of one specific vehicle. You can store lots of data on the same table with JSON columns. Don’t do eager normalisation.

The status history can be represented as a JSON array.

  • A transport table can have an address object as JSON in one column.
  • Two orders with different addresses (not normalised out into a table) for the same person, can be correct. They are likely historical records of the order. Not the person.

Having fewer tables and fetching the whole aggregate at once has several advantages. It eliminates many joins, eases binding from SQL to objects, and it reduces the number of queries you have to write.

You will have to learn ways to query JSON in your (preferred) DB, but it is worth it. They are surprisingly performant. At least in Postgres.

Migrations can be a bit more painful, but you can do it. I believe in you. 😉

Don’t use Object-Relational Mappers

While ORMs promise developer productivity, they often introduce more complexity than the benefits they deliver. Your team needs to deeply understand ORM internals. This includes lazy loading, flushing behaviour, and caching strategies. Without this knowledge, you will find yourself fighting the framework more than leveraging it. Take it from someone who has debugged them to death.

Flushing happens when you least expect it. IDs are fetched one at a time when you insert your gigantic collection. Everything is fetched without you understanding why and when. There is simply too much magic.

I know, I know, it seems scary to write “all that SQL”. But follow some of the other recommendations and there will not be that much. 🙂

Dedicated queries are (mostly) good

Do not obsess over having just one query for one thing. Like fetching the vehicles. Don’t go overboard, but specific queries are easier to fix when there is a performance problem. You can share common things, like the name of the fields in the query through code (just have a const?).

I would probably work harder to make INSERT and UPDATE be common, than SELECT. Your mileage may vary. 🤷‍♂️

Design, then monitor

Design with the least overhead, make it easy to map into the DB, and think in aggregates. Then monitor the performance.

There are tools to monitor on the DB side. However, I prefer the ones that provide deep insights on the application side. Tying endpoints and code to DB operations. When you tie things together like that, you can easily see patterns. One endpoint often calls the same DB query hundreds of times per request. It might be fast as hell, but it shouldn’t be called that many times.

Use EXPLAIN to see what really is slow in your query, and a proper APM tool. I like New Relic, but you should at least use an Opentelemetry Agent.

Change is (can be) easy

Many approaches to databases stem from the belief, ‘let’s get this right, cause change is hard.’ And changing databases is hard(er) some times. But like anything else: practice makes perfect.

If you start out by avoiding changes, you won’t be ready to perform the big change when you really need to. So keep on doing changes, keep them small and learn how to evolve your database. The small changes will prepare you for the big change one day. And it has a funny way of helping you think in small steps, so most of the big ones goes away.

Add as needed, to get the practice that makes any change just another small daily step.

Use migrations. Flyway is nice if you’re in Java land.

Some givens

I realise I still take some things for given. Sorry about that, but I can’t describe everything here. So just to be explicit, are some things that come to mind. Maybe I will discuss these in depth later:

  • Know your transaction boundaries, and manage transactions with errors.
  • Handle concurrent updates, they are usually less if you think in aggregates. Use optimistic locking if you have to.
  • Use connection pooling. New connections are really slow.

Other interesting topics to explore

In the end…

You want to deliver functionality with minimal complexity and cost. There is no silver bullet, but these techniques usually pays off. At least think of them as defaults, and do something different when the real need arises.

I like them because moving away from them usually isn’t that costly when you have to. Change to a different type of storage for parts of your domain if you have to.

Most of this is not revolutionary. However, I see time and time again that some of these solutions are new to developers.

I hope you discovered at least one new thing that you can try out. And let me know what you disagree with. Or how you do things differently.

Oh… Remember to backup your database. And protect the backups. I learned this the hard way. 😉

Thanks to Øyvind Asbjørnsen, Tore Engvig and Trond Marius Øvstetun for feedback.

Leave a Reply

Your email address will not be published. Required fields are marked *