This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the architecture category.

Last Updated: 2025-06-17

What is CQRS?

CQRS stands for Command Query Responsibility Segregation. At its heart is the notion that you can use a different model to update information than the model you use to read information.

Commands can't return anything and queries cannot mutate state. (Similar to redux in some ways.)

Simple examples

(The following example is one in a single codebase. More often things are split into separate repos.)

 // Before applying CQRS
 interface InsurancePolicyService {
     void ConvertOfferToPolicy(ConvertOfferRequest convertReq);
     PolicyDetailsObject GetPolicy(long id);
     void AnnexPolicy(AnnexRequestDto annexReq);
     List SearchPolicies(PolicySearchFilter filter);
     void TerminatePolicy(TerminatePolicyRequest terminateReq);
     void ChangePayer(ChangePayerRequest req);
     List FindPoliciesToRenew(RenewFilter filter);
 }

 // After applying CQRS, we get two entities
 interface PolicyComandService {
    void ConvertOfferToPolicy(ConvertOfferRequest convertReq);
    void AnnexPolicy(AnnexRequestDto annexReq);
    void TerminatePolicy(TerminatePolicyRequest terminateReq);
    void ChangePayer(ChangePayerRequest req);
}

interface PolicyQueryService {
    PolicyDetailsObject GetPolicy(long id);
    List SearchPolicies(PolicySearchFilter filter);
    List FindPoliciesToRenew(RenewFilter filter);  
  }

Often a Mediator pattern is used. The role of the Mediator is to ensure the delivery of a Command or Query to its Handler. The Mediator receives a Command/Query, which is nothing more than a message describing some intent, and passes this to a Handler which is then responsible for using the domain model to perform the expected behavior.

View tables can be mapped onto the UI elements instead of renormalized. Each visible field might correspond to a column. This data is precomputed.

Sometimes it makes sense to have separate storage engines for different queries and commands. e.g. ElasticSearch for queries side, postgres or an event store (if event sourcing) We append events (changes) to sequential list of past events (changes). This way we not only know the current state of the system but we can easily track how did we reach this state. Projecting is a process of converting (or aggregating) a stream of events into a structural representation.

View the football game image architecture here: https://altkomsoftware.pl/en/blog/cqrs-event-sourcing/

Commands vs Events

Commands either succeed and generate events, or they fail.

Events cannnot fail since you cannot change the past.

Commands are, eh, commands (or requests for something to be done), and their naming will be imperative: "AnswerQuestion!"

Events are in the past tense: "QuestionAnsweredCorrectly".

Another way of putting things: Commands are how things happen. Events are what has happened.

Strategies in event design

This is just one perspective... there are more out there.

There’s 3 categories that tend to make sense when thinking about the design of your events:

Change notification, no data. For example "Updated details for account X". Consumers of these events will then make requests to find the latest state. Change notification events should be emitted after the change has been written and eventual consistency has been reached.
Deltas: only the data that changed is sent. For example "Updated postal address for account X: {data structure…}" so that stateful consumers can update without further requests. This tends to suit a specific purpose, and can kick off a series of other calls if not all relevant info is in the delta. A variation on deltas is to include oldValue/newValue so that out-of-sequence or multiple messages emitted are more easily processed.
Complete model dump: "Updated account X: {data structure with everything for account X}". This is obviously very heavyweight in terms of data transfer — this gets chosen where you want a fan-out pattern, so that instead of lots of consumers making large REST/RPC calls all at once, your core system emits the data once and notification system carries the burden of data transfer.

How might event producers and consumers communicate?

Using a database like Kafka. Kafka is better than producers directly communicating with consumers because a Kafka cluster can keep itself together even if one Kafka server crashes. That means the messages will still get consumed. But with a direct producer-to-consumer architecture, the crashing of a consumer will cause the messages to be lost forever.

How do producers and consumers find each other?

Tools like Kafka give you a level of abstraction between your producers and your consumers. They both only need to know how to find Kafka, but then the system will configure itself based on what Kafka says. If your producers talk straight to your consumers, then they not only need to know who all the consumers are, they need to know if you ever add a consumer.

Example use cases

If your use case is not here, be careful you are not fooling yourself into doing architectural gold-plating in favor of uneeded functionality.

1. Require independent reads and write (due to lopsided performance, complexity)

Data is constantly being written (commands) into the system in huge quantities. You want to present a "snapshot" of the data, because trying to do it "realtime" for all your users will demand too many resources, and your system will grind to a halt. Therefore you create the "snapshot" every 10 seconds, from code in an autonomous component, and then store it as a serialized object in a data store. Like a cache of some sort. When the query asks the system, the system loads up the "snapshot", deserialze it, and returns that representation.

i.e. this allows you to independently scale the query side and to better handle concurrent access as reads no longer block writes or the command side.

Another perspective is complexity - that reads are usually much simpler than writes (which need logic).

2. Prevent over-fetching of data in views

This issue here is that the coupling of reads and writes might force a coupling of the domain model's data structure that will cause you problems with over-fetching data.

I guess an example of this is my NotesFile shared read/write object fetching the very long full text in many contexts where this data was unnecessary and just wasted RAM. With CQRS we can optimzie this by working with the minimal data we need on both sides.

3. Require great auditing, reproduce intermediate states (plus time-travel debugging)

Works because everything is an event you can persist. This can be great for high-compliance applications where you might want to roll back writes and reconstruct earlier states.

What might a realistic great audit required situation be? Med tech and regulation stuff (e.g. notifications) If a doctor submits a prescription, we can show all side effects that happened for visibility (ie. this triggered a lab task, push notification, sent this message). We can verify this in the UI and see what happened for each patient without relying on assumptions.

Can be helpful for feeding machine learning engines.

(That said, trigger based DB logs could do that in regular CRUD applications, along with various other technologies. )

Tips

Version everything. You might therefore need middleware to dispatch events to the handler for the appropriate version.
Store insert order of events somehow.
Consider associating groups of events into named streams that get stored alongside events in the event store. E.g. one stream might be the users UUID (allowing you to target events happening to them). Another might be all the events related to taking a 20 question quiz (the "quiz" stream).
Draw out event flow charts (to give an architectural overview)
If there are data that do not interact (e.g. data generated for students at 1000 different schools), then these can be reduced in parallel, speeding up database reconstruction.
Migrations can consist of going back through all previous events and adding a data field.
Use (and consider writing) code generation heavily to simplify dealing with the boilerplate

Potential difficulties

CQRS is not a top level/system level architecture. You should only use it for the subsystems that need it.
Where's the REST/Kafka boundary? I.e. when should something write to the Kafka log as opposed to POSTing directly to another service? E.g. If a user sets their payment method, do we update some DB immediatley, or do we write the fact that they set their payment method onto Kafka, and have another service read it?
Services which need to replay from the beginning of time processing past events can take too long to start up (especially if you calculate current state in RAM instead of in some checkpointed cache).
You need to be serious about versioning. Since many services read messages from Kafka, you can't just change the implementation of those messages. We explicitly versioned every event in its class name.
Queue for writes may become full or too slow, leading to data inconsistency.
Database transactions become very difficult to coordinate.
Be careful with idempotency. Make sure no runaway process can generate too many events.
If the events are ending up with names like SomethingCreated or SomethingUpdated then it is clear you're not using DDD at all and you're better of without event sourcing. The events should instead capture user intent , such as CustomerRelocatedToNewAddress or CustomerAddressCorrected
Be careful events are not too granular.
In "imperfect" situations, ones where you may not have the data that forsees that something will need to be undone, thus you may not be able to do it at all. Imagine SetAssetOwner(assetpk, playerpk). This would set the digital asset item's player foreign key field to the player's primary key. Easy. However, you have lost knowledge of where the asset came from and cannot undo this operation (or at least quickly). A better solution would be to make an event SwapAssetOwner(asset_pk, owner_pk, recipient_pk) - an event encapsulating the reverse can quickly be sent out.
Major issues with GDPR and append-only databases.
Async error handling
Cost of infrastructure may be much higher (many deploys, much event data, much computing)

How to avoid calculating data from scratch?

The solution is to create snapshots from time to time, e.g. 1 one snapshot for every 10 rows in the table. For example, if you were measuring events that affect balance in a bank account you might have withdraw and credit events. The snapshot would give you the balance after each 10 of these.

Semicolon & Sons