Introduction:
Microservices architecture leads to build highly scalable, available and resilient systems while developing and deploying each service independently, possibly using different technologies. While microservices have a lot of advantages it also brings some level of difficulty when it comes to tracing, testing, logging and especially with data sharing, transactions.
When working with microservices each service should own its data. This will prevent single point of failure with other services, like using a common data store. However it will also come with three problems;
- Sharing data between services
- Running transactions across services.
- Scalability requires to replicate data store in each instance of a service, otherwise it will lead to another single point of failure; the single data store shared by instances of a service
To share data Rest calls can be used. However a Rest call will be a synchronous operation, because Rest uses Http to communicate, and Http is a synchronous protocol which requires to send a request and wait for a response immediately, blocking the caller. As you can imagine with multiple services communicating over the network this would cause significant problems. There can be delays causing higher response times, and if there is no timeout applied properly on caller service, that can block indefinitely.
In addition to that, to query the data on a specific service, we will be restricted with the abilities of the data store’s query capability on that specific service. For example, the data store could be Cassandra or Elasticsearch, or a Relational database, and the data you want to query will be stored differently in each case. This may create differences in the returned data format and on the performance of a query.
For transactions across services, distributed transactions (two-phase commit - 2PC) can be used. However this will require the data stores in each service to support 2PC, which may not always be the option. Besides, according to the CAP theorem we cannot choose consistency, availability and partition tolerance at the same time. In most cases choosing availability and partition tolerance is a must, so we need to find a solution other than a distributed transaction that requires traditional consistency. This solution will lead to an eventual consistency.
The main problem resides in above mentioned points is the strong coupling of different services. Could there be a better way to decouple services truly?
And for the third problem above, scalability, could there be a way to remove the state from the service and scale much more easily?
EDA comes to rescue:
An event-driven architecture (EDA), when implemented correctly, can be used to resolve the issues mentioned above. It leads to accomplish true decoupling of services, and to build to a better microservices architecture with more resilient services. In essence EDA helps to;
- Create resilient services, as a service has no direct communication with other services. When a target service is down, it will not affect the caller service because of the decoupling.
- Use asynchronous communication between services. A target service will produce/create an event at a time. Then the caller service can consume/read that event at a later time. It also allows to adjust the consumption according to caller service processing power and leads to replay the events if needed.
- Favour to use a state store for all services, removing the state from service and leading to better scalability.
To better explain how EDA helps to create more resilient and asynchronous communicated services, let’s go into the details of EDA a bit.
First of all, let’s discuss what is an event. An event is basically a change of state on the system that can be recognized, reacted and processed.
Since the events are the core components of EDA, we can ask how events help EDA to be asynchronous and non-blocking? An event is always created by a producer, and can be consumed by one or more consumers. The important thing is that those producers and consumers are completely decoupled and don't need to know each other. An event can directly go to a subscribed consumer or go to an event store if used, such as Kafka, which is the preferred way most of the time. In any case, events will be one-way, and never expect a response. Consider a communication that uses Http protocol, such as Rest. Such an Http call needs to include a request and a corresponding response which be a synchronous operation. Since an event never needs a response, an architecture that uses it such as EDA will naturally be a non-blocking/asynchronous system.
Using an event store in EDA will make the things easier, as all producers and consumers only need to know about the event store. Besides, by keeping all events in the event store we can make the services truly stateless. That will then lead to truly scalable services as without a state creating a new instance of a service will be much easier. When we say an event store that is used by all producers and consumers, you may think that we are creating another single point of failure. That might be true if the event store solution is not an easily scalable and resilient solution.
I would favour Kafka at this point as an event store, being an event-log, and a fault-tolerant and naturally scaling solution thanks to built-in partioning. From the Kafka documentation;
Event sourcingis a style of application design where state changes are logged as a time ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.
Note that, any new type of event will be a corresponding topic in kafka terms.
Let's leave the details of Kafka, as general and as an event store, for another post and continue with the details of EDA.
With a scalable and resilient event store, consumers will be able to replay an event, favouring availability of the system, when there is an error or simply when they are offline.
One important aspect of an event is that it must be immutable. Therefore the event store should also support immutability, which can be achieved by the concept of event logs. In an event log you cannot change and delete an event after it occurred. Any update to the state of an application will be a new event. The latest state of an application can be obtained by replaying the events in all history.
Now, let’s remember the issues we listed at the beginning of this post. We asked;
- Could there be a better way to decouple services truly?
- Could there be a way to remove the state from the service and scale much more easily?
And pointed the problems about;
- Sharing data
- Transactions across services
- Scaling
Here are the solutions to these problems by EDA;
- Truly decouple the services. They don’t need to know about each other. Being producers and consumers that are creating and consuming events, the services only need to know about the event store.
- As the states will be logged in the event store, the stateless services can easily scale by just creating new instances.
- Can share data asynchronously using events and an event store. Note that a materialized view is always an option for pre-loaded joined data, across services. This materialized views can be constructed as a local database. To keep this view up to date a consumer needs to be defined by subscribing to the corresponding event in the event store.
- Can handle transactions that span multiple services in an eventual consistency manner using events. Such as a producer creates an event for a change and the corresponding consumer will react to this event to complete a transaction asynchronously.
Additionally, EDA has also below benefits for a microservices architecture if implemented with a sophisticated event broker.
- Easy asynchronous communication of services
- No Api change issue, as no direct call between services
- Independent development of components, i.e; no API dependency to other services
- Better resiliency and fault-tolerant
- Better scaling with stateless services
- Reduced complexity by giving some responsibilities to event broker
Conclusion:
Having the ability to create truly decoupled services, EDA is a perfect match for reliable, fault-tolerant and better scalable microservices architecture by solving general issues raised like communication of services and api changes. Events can be any real time changed state, such as signals created by sensors, dropped packets on networks, security violations on file systems, or creating tweets on twitter. EDA can be used in all types of applications that can be pictured using events.
As you can imagine not all systems can completely fit into an event driven, asynchronous and eventual consistent architecture. In that case EDA can be combined with a synchronous solution like REST in case there is a need to do an operation with traditional consistency manner, following ACID transactions.
If you want to see an event-driven microservices architecture in live, you can check my course on udemy: Event-Driven Microservices: Spring Boot, Kafka and Elasticsearch