Domain Events versus Change Data Capture

The building of change data capture (CDC) and event based systems have recently come up several time in my discussions with people and in my online trawling. I sensed enough confusion around them that I figured this was worth talking about here. CDC and event based communication are two very different things which look similar to some extent, and hence the confusion. Beware – using one for the other can lead to very difficult architectural situations.


What are these things?

Change Data Capture (CDC) typically alludes to a mechanism for capturing all changes happening to a system’s data. The need for such a system is not difficult to imagine – audit for sensitive information, data replication across multiple DB instances or data centres, moving changes from transactional databases to data lakes/OLAP stores etc. Transaction management in ACID compliant databases is essentially CDC. A CDC system is a record of every change every made to an entity and the metadata of that change (changed by, change time etc).

I have written about events on this blog before and have described them as announcements of something that has happened in the system domain, with relevant data about that occurrence. At a glance, this might seem to be the same as CDC – something changes in a system and this needs to be communicated to other systems – which is exactly what CDC is about. However, there is a key distinction to be made here. Events are defined at a far higher level of abstraction than data changes because they are meaningful changes to the domain. Data representing an entity can change without it having any “business” impact on the overall entity that the data represents. e.g. There can be several sub-states of an order that an order management system might maintain internally but which do not matter to the outside world. An order moving to these states would not generate events but changes would be logged in the CDC system. Vice-versa, there are be states that the rest of the world cares about (created, dispatched etc) and the order management system explicitly exposes to the outside world. Changes to or from these states would generate events.

The difference can be stated explicitly in terms of system boundaries. When we design microservices or perform any system decomposition, we are trying to identify and isolate bounded-contexts or business domains from each other. This is the basis of all domain-driven design.

CDC is about capturing data changes within a system’s bounded context, usually in the terms of the physical model. The system is recorded changes to its own data. Even if we have a separate service or system which stores these changes (some sort of platformized audit store), the separation is an implementation detail. There is a continuity of domain modelling between the actual data and changes to it, hence both belong logically inside the same boundary.

Use CDC within bounded context, events outside it

Events, on the other hand, are domain model level broadcasts emitted by one bounded context to be consumed by other bounded contexts. These represent semantically significant events in a language that the external systems can understand and respond to. That they are published over the same messaging medium, use similar frameworks, maybe get persisted somewhere etc are all implementation details.


Support this blog on Patreon


What about CQRS

What about building a CQRS style system? For the uninitiated, CQRS (Command Query Responsibility Segregation) is an architectural style where the data model and technologies used for writes (Command) are different from those used for reads (Query). Such a design is typically used when there is large difference in write patterns and to be supported and the read patterns to be supported. I have given a brief example of such a system in my case study on the nuts and bolts of using asynchronous programming. Updates to the command model and propagated to the read model, typically (but not necessarily) asynchronously. Can we use CDC for this? or should the command module emit events which are read by query module to build its data model?

CQRS can use CDC since models lie within same bounded context

I would argue that since the command-query model separation is internal design of the system, both models lie inside the same bounded context and using CDC logs would not be inappropriate. Both producer and consumer are at the same level of abstraction (both are data stores, though one may be MySQL and the other ElasticSearch), so using DB level change logs is not a bad idea. This is of course, just an opinion. Using event here would not be bad either, especially if different teams manage the models.
The command module should always emit events anyway, if nothing else, then for lending evolutionary characteristics to the overall architecture.

Building CDC and Eventing Systems

In modern distributed setups, change data is typically published over a messaging medium like Kafka and can then be consumed by other systems which want to store this data. A very popular and efficient way of building CDC systems is by using tailing the internal log files of databases (MySQL and other relational DBs always have this for transaction management, ElasticSearch has a change stream in its newer versions) using something like Filebeat and then publishing the logs over Kafka. The other side typically has Logstash type plugins to ingest data into other system which persist this change log. Consumers may also be Spark/Flink style streaming applications which consume and transform this data into a form suitable for other use cases.

This is obviously not always possible since not all databases have change log files to stream. For these system we must resort to adding code to the application layer itself to emit the change log. Making sure that there is no case where data gets changed but log is not emitted is a very hard problem to solve (essentially an atomic update problem : how to make sure that DB update and event emission over Kafka both happen or nothing happens). Lossless-ness is critical in a CDC system.

To build an event based system, we would have the event generation logic at the applications layer like we did for CDC in case of databases that don’t have log files. That is the only place where we can translate the language of the database to the language of the domain. Same as for CDC, preventing event loss in the publisher is key to the design.

However. some people propose to use CDC stream as a system’s event stream, and this is where I completely disagree due to all the reason I have mentioned above. This would couple other systems to our system’s physical data model, and we would have to forever keep our public entities same as the database model. This severely reduces the expressiveness of our domain model. Consider an order getting cancelled. The CDC system will record something like

ChangeLog{“order number” : “12345”, “changed field” : “state”, “old value”: “in progress”, “new value” : “cancelled”}

If I were to express this in my domain language of what can or cannot happen to orders, I would ideally something like

OrderEvent {“order number” : “12345”, “event type” : “order cancelled”}

But this abstraction would just not be possible if we physically couple the language of transmission to CDC language.


Summary

One of the things to remember in building software is this : sometimes things that look similar and use similar tools to function are not the same. Especially when working with logical and physical models, we should be careful to isolate the implementation detail from that which is being implemented. Look hard at the publisher and consumer of the record that we are publishing – if they are both defined at “data store” level, we are probably talking CDC. If that more of business constructs (bounded contexts) like order, courier, invoice etc, we are likely in events-ville.

Read Next : Messaging terms defined precisely

Leave a Reply