Microservices Databases Migrations

Nowadays users demand applications that constantly run and can be updated without the user having to drop off. To do that, developers need to integrate microservices. But what happens with our database? How do we apply changes to a database without locking out or disconnecting users? This article will explain how to achieve a zero downtime database migration.

Microservices

As software engineers, we are always working to deliver better applications for our users. We want to deliver the fastest and most reliable applications that are always available to the users when they need them.

Developers often implement new strategies in the software lifecycle to make delivery faster. Instead of deploying code twice in a month, teams often want to deploy several times a day. This enables large applications to be accessible to millions of users and be frequently updated.

There are many different approaches that software architects can apply when working with microservices. Microservices is a buzzword that we hear a lot in the software community, but the concept is pretty simple: decreasing the size of the manageable components that make up the application. It’s all about cohesion and coupling.

Zero Downtime Rollouts

This article will focus on a couple features of the microservices approach, not because the other ones are not important, but because we want to focus on one important aspect: making sure that our application is online all the time (or at least a fraction of 99% of the time).

NEW RESEARCH: LEARN HOW DECISION-MAKERS ARE PRIORITIZING DIGITAL INITIATIVES IN 2024.

In the monolith world, rollouts of versions were often an issue for the operations team. But how can we build an application that consists of a set of components that can be kept alive while other components are not responding? The answer is pretty simple: having multiple instances of each component and load balancing them. If you have multiple instances of the same piece of code answering your users requests, you can incrementally rollout your version without people feeling that something is happening.

Having multiple instances of the same component can be easily done when you have a small, stateless component to handle. Using a modern application http server, Docker and tools like Kubernetes, you’ll be able to accept requests instantly. Kubernetes can be configured to deploy a predefined number of replicas of a container and it will load balance the incoming requests. When you roll an update, Kubernetes will update incrementally, ensuring that there are services in both old and new versions ready for answer requests.

The Problem with Databases

While the previous steps have been easy, the difficult part is about to start. Having stateless services is achievable – the question is, how will we handle the database, that is, by all means, stateful? As Edson Yanaga likes to say: Code is easy, state is hard.

The right tools and patterns make it easy to break code into small, stateless, and immutable services. But dealing with the state of the application, is when the task starts to get nerve wracking. So what tools and techniques can help us to achieve zero downtime version upgrades on the many databases we can have?

Database Migrations

The first design decision we need to make is the way we apply changes to our schemas. We need to consider that now we cannot simply change the schema and the application will be updated when deploying it. We need to make sure that every database schema change is backward compatible, which means that both the old and the new application code will be able to read and write data at the same time. Even if the rollout takes less than a minute, we need to make sure both versions will be able to execute statements against the database.

How can we do that? Amazing tools can help us to manage the database versions migrations; Flyway and Liquibase are good examples. Those tools are responsible for executing SQL database change scripts, making sure every change that is made is versioned and ensuring that change would not be executed again. Flyway keeps a table with the SQL files versioned and with a hash of the code, so in case you try to change a SQL script that was applied again, the migration will fail on your deployment pipeline.

But the real work here is writing the SQL changes. Every time the developers write an alter table statement they need to keep in mind this key concept here: both old and new application versions will need to write/read on this schema.

The monolith development flow usually looks like: the developer drops an existing column on the database, and changes the INSERT/UPDATE statements on the database that are referencing this column. While the application is being deployed, during a small window of time when the service is offline, the DBA will execute the statements and the new application version will be deployed.

Let’s now dive into how a microservices development flow should look. At this moment in time the service and its database are running stable on production. The developer needs to move data that, in the current version, lives on a column of a given table that is to be updated to a brand new separate table with a relationship.

In the new version of the code, the developer can create the new database and define a relationship between the tables, but it cannot drop the old column. The new application code needs to write in both the old column and new column. This step is needed for the small timeframe of the rollout and in case of a problem when the rollback of the version is executed.

In this scenario it is required to have a strategy to copy the data that is inserted in the old column after the database migration is done. Our asynchronous task would solve this problem. We will talk more about those strategies later on this article.

The current stable version has both old and new data domains working as expected. In a future version, the developer wants to apply a new change that drops the old column, directing the write and read operations for the new schema.

There are different strategies that an engineer can execute on the schema of a database. But there are a few rules that should be followed that would make the database migrations easier.

A destructive operation (drops and deletes) should never be executed in the same version that the new schema is being created. Future versions, after the new code is reliable enough, are the best moments to execute it, as a clean up task.
Data validation constraints should never be created in the same version in which the new code is created. Again, future versions are the best moment for that.
Triggers and views are native objects on the common database systems that can be used to keep a middleware to sync data between the old and the new database models. Triggers can help in scenarios where the same data needs to be written in many places in the same database. Views provide many read places for the same data. Future versions of those objects can be dropped if needed.

Flyway and your Delivery Pipeline

So far we understand what are and why we want database migrations, how they can be implemented using Flyway, and what practices engineers should follow while writing SQL code. But, when does the flyway come to action during the deployment process? There are different ways to run a flyway migration, and its API provides a command line client and many different integrations with different application frameworks. For small services that connect on their own database, using an integration is a valid strategy. Using Quarkus, for example, you can add Flyway as an extension and it will run the migration when deploying the service.

To avoid any unnecessary coupling between the application and Flyway, the command line is a good strategy too, as a step of your pipeline. If the SQL was written to meet the backward compatibility requirements, the migration should work smoothly without affecting the legacy code that is still running. As a next step, the application can be incrementally deployed since the database is now updated and ready to accept data from both legacy and new code.

In case something goes wrong, you can provide rollback scripts and Flyway will be smart enough to execute those, changing the schema back to the old version. For example, if you create a few tables and relationships in a given version, you can provide for the same version drops for the new objects. Those commands will be executed reverting the structure that it was created before.

Final Thoughts

Dealing with the application state can be a challenge when moving towards microservices. There are different strategies to dealing with the application state. Often mistakes that are made during the decision making process put the entire work at risk. Use existing concepts in the software community and successful migration examples to make the decisions. The relational databases’ systems are full of capabilities to help execute soft migrations that will not break the compatibility of the application. The community is full of interesting tools that can help you manage versions of your databases, and considering microservices need a fair amount of databases, additional tools will become handy pretty fast.

Posted in DevOps

Wesley Fuchter

Wesley Fuchter is a Senior Principal Engineer at Modus Create, with over 13 years of experience building cloud-native applications for web and mobile. Working as a tech leader he's spending most of his time working closely with engineering, product, and design to solve customers' business problems. His experience sits at the intersection of hands-on coding, innovation, and people management with a mix of experiences going from AWS, Java and TypeScript to startups, agile and lean practices.