So – after quite some time working with distributed systems, I want to share the decisions made throughout as you build an event-based distributed system.

I want to share some of the things that I’ve noticed, and my thoughts in general in these different approaches – what they look like, pros and cons etc.

Command vs Event

First things first – Command and Event are two distinct things – Event is a fact that something happened and therefore, can never be “invalid” – we might not be interested, but this is never in an invalid state – whereas a Command is a request for something to happen passed as a Message.

Message Envelope

This is the typical message envelope that I prescribe.

{
  // the type of event or command i.e. JobPosted or PostJob if it's a command
  type: string, 

  // so clients know how to handle our message
  version: string,

  // an id that uniquely identifies a message
  id: string

  // a globally unique id for easier debugging
  correlationId: string / guid,

  // what generated this event / command
  source: string,

  // when the event happened or when the command was issued
  timeStamp: long / DateTime,

  // extra meta data to describe the event / command if needed
  attritubes: object

  // the actual payload
  payload: object
}

Let’s briefly look at how you might utilise these properties and where they should come from.

Type and Version is a way to identify the message so that the client knows how to handle it.

Id is an abstract concept that can be used for deduping messages on the other end. This is typically implemented as a type of hash using various properties of the message. A very simple example for a user-generated event would be the hash of timestamp, type and user id assuming that a user can’t feasibly invoke the same type of event at the exact same time.

CorrelationId is generated by the originator of the event and is propagated throughout the rest of the system. This allows us to easily trace at a per event basis anywhere within our system given enough logging.

Source is generated for each time a new message gets produced and uniquely identifies the application and/or user that generated it. If a particular event or command can be published by more than one application, this becomes valuable for tracking down bugs upstream and it makes filtering logs a bit easier.

Timestamp is invaluable in message-based systems – this allows us to find bottlenecks within our system and is an additional data point that we can use for lag compensation for out of order messages.

Topic Segregation

One of the other things to consider when going for an event/message based system is topic segregation i.e. what events should to published to what topic?

Topic Per Event Type

In this segregation setup, every event type i.e. JobPostUpdated, JobPostCreated etc. are published to their own topic.

This allows for consumers to more granularity in terms of event types that they want to listen to without reliance on tech-specific feature – although arguably, most messaging solutions such as AWS SNS/SQS and Azure Service Bus have their own filtering solutions making this a non-issue in most cases.

Pros

Cons

Topic Per Entity

The idea here is to publish all event types for each entity to a topic, for example, Job, User, Recruiter etc.

Although consumers have no tech agnostic way of filtering out event types they don’t want to listen to, as mentioned previously, this is usually a not an issue, as most messaginge solutions have a filtering feature – if not, the consumer can just discard the irrelevant events anyway.

Pros

Cons

Event Message Content

One of the things to decide when communication via messaging is what to include in content – and you mainly have three ways of going about this.

1. Entire State

The idea is to simply include the entire state of the entity relating to the event. For example, you may have a JobPostUpdated event – this will include the entire state of the Job.

{
  posterUserId: 123,
  jobId: 567,
  title: 'Job Title',
  description: 'Job Description',
  tags: ['c#', 'javascript', 'developer']
}

As soon as we add a new property, we simply add it to the message – simple enough, no need to change version as it’s still compatible with the old clients.

But what if the clients now requires to know the first name and last name of user that posted the job? we have to now add it to the payload.

{
  posterUserId: 123,
  poster: {
    firstName: 'Jon',
    lastName: 'Doe'
  },
  jobId: 567,
  title: 'Job Title',
  description: 'Job Description',
  tags: ['c#', 'javascript', 'developer'],
}

Well… that user is probably associated with some sort of an organisation – and the clients downstream need those now as well as part of the requirement for project ABC etc. where does it end?

Pros

Cons

2. Changes Only

The idea here is to simply publish the state that has changed instead of the entire payload, so instead of a single JobPostUpdated event, we split this up further into smaller cohesive events – such as JobPostTitleChanged, JobPostDescriptionChanged, JobPostTagsChanged etc.

JobTitleChanged

{
  jobId: 567,
  title: 'New Job Title',
}

There’s an assumption here made that all clients are aware or are up to date with the previous complete state of the entity.

Pros

Cons

3. ID Only

Only publish the ID of the entity that changed, for example JobPostUpdated would only contain:

{
  jobId: 567
}

The typical mechanism to get further data looks like this:

1. Publisher --> Publish Event --> Client
2. Client --> GET --> Source of Truth

The producer publishes an event, the client receives the event with the entity id, and it grabs the entire state by asking the source of truth.

Pros

Cons

Publishing Messages

Choosing the approach to publishing messages is a critical decision to make – as depending on requirements, you may have to guarantee delivery of messages.

Part of DB Transaction

Simply include the publishing of the message as part of the DB transaction, a psuedo-code example looks like:

using(var db = _dbConnectionFactory.GetWriteConnection())
{
    db.BeginTransaction();
    try 
    {
        // do a bunch of SQL queries
        var myMessagePayload = somePayload;

        // we're doing this as part of the transaction before commiting
        await _publisher.PublishMessage(myMessagePayload);
        db.Commit();
    }
    catch (Exception ex)
    {
        db.Rollback();
        throw;
    }
    finally
    {
        db.Close();
    }
}

Pros

Cons

Outside of DB Transaction

Simply publish the event outside the DB transaction and hope that it publishes – this is very much the simplest approach, but of course, you lose the guarantee of message delivery. You could add retries around the publishing to reduce chances of losing an event.

using(var db = _dbConnectionFactory.GetWriteConnection())
{
    db.BeginTransaction();
    try 
    {
        // do a bunch of SQL queries
        var myMessagePayload = somePayload;        
        db.Commit();
    }
    catch (Exception ex)
    {
        db.Rollback();
        throw;
    }
    finally
    {
        db.Close();
    }

    // we're doing this outside the transaction
    await _publisher.PublishMessage(myMessagePayload);
}

Pros

Cons

Out-of-process Event Tailing

The idea here is to have an event audit within the application producing the events as part of the DB transaction. This means guaranteed storage of all events that have ever happened.

There will be another process in place that tails the events by pushing and marking each event as published.

This is arguably the most time consuming way of doing this, but this sounds like the proper solution to this problem as it guarantees message delivery and does not affect application performance in anyway, bar the extra time needed to write the event audit.

Pros

Cons