CQRS and Event Sourcing

For many years now we’ve been talking about microservices, the approach, techniques, pros and cons, the wheres and whys, and how to use it. As with everything in software engineering, microservices is another set of patterns full of solutions and trade-offs. This article will explain how event sourcing and CQRS can help you build an asynchronous, distributed application.

The Asynchronous Nature of Microservices

When talking about microservices there are many different explanations related to when and how to use it. We cannot consider microservices as silver-bullet (and probably nothing else in the Software Architecture domain), it is very important to consider the requirements we need to achieve and what those microservices can help to get done. It’s all about trade-offs.

Building microservices is often related to performance and availability. That is right, microservices can help you with that. If done correctly. Building microservices requires changing the domain and the business mindset for a more asynchronous and reactive way. A problem that often arises in microservices applications is when the architecture has services that depend on other services to get its request answered.

Let’s say in example that the order-service calls synchronously payment-service, during an ecommerce sale. What is the problem here? If the payment-service is slow for a given reason, it will delay the entire sale transaction, which makes your software less performance than it should be. Even worse if the payment-service for some reason is offline. Then your whole sales transaction will fail due an availability issue.

This scenario presents us small but realistic reasons for having asynchronous processing in a distributed application. If, in that scenario, the order and payment services communicate between themselves in asynchronous way, using a Queue for example, the performance and availability would not be compromised. Of course we need to keep in mind that we cannot build 100% of our services asynchronously, this is not the idea here at all.

Event Sourcing

Event sourcing is, according to Martin Fowler: “Capture all changes to an application state as a sequence of events.” Let’s think about a personal financial control app and its domain. The balance of a given bank account is a result of the historic and sequential set of transactions that happened in a timeframe. A bank account owner can have incomes (salary, revenue, etc) and expenses (a new bike, a fancy restaurant dinner or a trip to foreign country). Every single transaction can be stored as an event on a database and by doing a query on this database we are able to recreate the state of the balance in a given point of time. Again, as Martin Fowler: “There are times when we don’t just want to see where we are, we also want to know how we got there.”

Event sourcing can be done synchronously, for sure. But its architecture makes perfect sense in an asynchronous approach, what brings it as an important design pattern for microservices applications. The idea of having a third component between the transaction and balance services seems to be a good strategy, even better if we have it asynchronous, which means happening without a hold time for the user. For that, queue applications like RabbitMQ, SQS or the most loved one Apache Kafka are good tools that could address this asynchronous bus layer.

But what exactly “Capture all changes as a sequence of events really means”? It means that your architecture will persist on a database, not only the result of that change, but the data related that resulted in that new state. In our financial scenario, every transaction, such as incomes and expenses, its value and date, will be stored as a one row on your database, another asynchronous process will trigger a balance recalculation that will store the new balance with the resulting amount of money on the bank account. Every single time. Why? Let’s understand it better.

CQRS – Command Query Responsibility Segregation

Not always, but often, the CQRS concept is directly related to the Event Sourcing pattern. In many cases, the default Create-Read-Update-Delete (CRUD) approach is enough to read and write data for a given entity. But there are some scenarios when performance is an important requirement, and in those scenarios the data is usually bigger than the application is used to.

Our financial system keeps record of every financial transaction, incomes and expenses of its users as a set of events. As a must-have feature, the same financial system needs to provide an updated balance of its users accounts.

This scenario seems to be an easy one, you might be thinking, we just need to store the transactions and make a simple sum operation to result in the balance, considering that expenses are stored as negative values. How big a transaction table that has every financial transaction of all users would be after a couple months? HUGE. A sum operation is a simple one, but executing it every time someone opens the app to see how much money they have seems to be a future performance issue. How can we solve it?

Defined by Greg Young, and often related to Martin Fowler, CQRS stands for Command Query Responsibility Segregation. The idea is pretty straightforward: Having different domains for writing the data and reading it. Bringing this concept for the microservices world we would have: A service that is responsible for writing the transactions on a simple database and another service reading the data from a per bank account balance table.

CQRS

When the customer is doing a financial transaction, the transaction-service has on its hands the value and the type (income or expense). After persisting as an event to keep track, this value can also be yielded to another service that will trigger the balance recalculation. This communication can be handled asynchronously by one of those event bus carriers that I mentioned before. The recalculation process handled by the balance-service only needs to do a simple math, adding or subtracting (depending on the transaction type) of the current balance. In other words, the application will not need to navigate through the millions of transactions on the database in order to result the balance — better performance when the users want to see how much money they have. Also, during the transaction processing, the users don’t need to wait for their balance to get ready to have the transaction confirmed.

There are other many different approaches to solve this problem, CQRS and Event Sourcing combined is a strategy that focuses on the application’s high availability while delivering the feature. Although it should not be used when the application does not need to achieve this level of availability and resilience. Overengineering and premature optimization is never a good way to go when implementing features.

CAP Theorem and the Eventual Consistency Problem

As we know, nothing in life is perfect. Even this approach seems to be a good one, we need to be careful with a trade-off: eventual consistency. Although it should not be considered as a problem, it is an option that we need to take while making decisions on building the feature. Edson Yanaga explains eventual consistency by saying that “<em<eventually at a point in the future, all access to this (given) data item in any node (of the distributed system) will return the same value“. The key word here is “eventually“, which means that you might get stale data at some point when fetching it before the distributed updating process is done.

There is a very thoughtful concept on the cloud native world that is the CAP Theorem. It stands that a distributed application cannot have three of three requirements that are:

Consistency: Every read receives the last written data, or an error.
Availability: Every request receives a success response, but is not guaranteed the response contains the last written data.
Partition tolerance: The application will remain online despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

When building the financial app the engineers must evaluate what requirement they are willing to give up in favor of the other two ones. The customer that just finished completing the transaction might get its balance stale, just because the event that is waiting on a queue service wasn’t processed yet. There are different strategies to make this problem smaller, for example synchronously triggering the balance recalculation once the user hits the UI. But, by the end of the day, the application cannot fully get away from that rule 2/3.

Final Thoughts

Building a microservices architecture is full of decisions and its trade-offs. Nothing can really solve all the problems, and we, as engineers, should not be looking for solutions like that. These days there are many concepts that define a microservices application, many of those, if not implemented, the application is not fully using the power of this approach.

Event based architectures bring us many opportunities to make the end-user experience better, but there are some risks that need to be considered when designing these applications. Again, trade offs. Luckily for us, there are hundreds and thousands of documentation out there with design patterns that are already foreseeing and solving possible problems that are inevitable with the proposed solution. It’s our responsibility as engineers to keep those documentations in hand and keep track of the decision processing.

CQRS is a good example and it’s a great solution, but it is not a silver bullet either, actually, the old school CRUD style is still a better solution for the big majority of the cases. The key difference between CRUD and CQRS is that the first option usually fits the needs and also can be performed by having a simple domain for it. While CQRS requires different models to write and read data, what can, and usually does, make your domain more complex, CRUD allows us to reuse the same model for both operations, relying that the schema the data was inserted to the system will be always the same when fetching it. As always in software engineering, we should not make the software more complex than it needs to be.

Posted in DevOps

Wesley Fuchter

Wesley Fuchter is a Senior Principal Engineer at Modus Create, with over 13 years of experience building cloud-native applications for web and mobile. Working as a tech leader he's spending most of his time working closely with engineering, product, and design to solve customers' business problems. His experience sits at the intersection of hands-on coding, innovation, and people management with a mix of experiences going from AWS, Java and TypeScript to startups, agile and lean practices.