4 Strategies for Future Events with Event Sourcing
Most systems will have to deal with future or deferred events. Even the most common example of software on the web — a blog — will have the ability to “publish” a post in the future at a certain time.
When that time is reached, the read model should return “published”. How does the read model do this? Compute the current time and override the status? What if we need a Reactor to do something once the post is published? Following are some strategies which can be used, depending on the system and the complexity required.
But first, this problem is not exclusive to event sourcing, take the world’s most popular blogging software, WordPress, it has two key fields in the wp_posts
table — post_status
and post_date
.
When the editor sets a post to go live in the future, the post_status is “future” and the post_date
is given the timestamp of when the post will go live — simple!
How does the post_status
change from future to publish? WordPress seems to run a check on every request to update the status where post_date
is in the past (it is WordPress after all!). This is actually brings us to our first strategy.
Note: As I explore each strategy, it solves a problem created from the previous strategy, let’s gooo!
Strategy 1: Update our read model
Time processing is deferred to a dedicated CRON script. This could be as simple as executing the following SQL query every minute / hour / day.
UPDATE posts SET status = "published" WHERE status = "future" AND post_date < NOW()
Strategy 2: Compute in the Read Model
Relying on a cron job, means our read model will be outdated until the cron job runs. Running the script on every request (ala WordPress) will come with its own performance problems.
We can get around this problem, by computing the status on the fly. The read model can introduce a new state field called “current_status”. When retrieving our model, it’s simply calculated on the fly. Here’s an example of a Laravel computed attribute.
Fantastic! Now our status field is super dynamic and always up to date!
But what if we need to grab all published posts, we’d have to retrieve all records from the database into memory and then display only the posts which are published.
An alternate implementation of this strategy is doing the computation in the database. This can be achieved via Generated Columns which is available in recent versions of Postgres and MySQL. This moves the business logic into the database layer which may or may not suit the culture of your business.
Strategy 3: Scheduled Commands / Events
We’re getting to a pretty good place here, however, there’s a key thing missing from an application standpoint. What happens when we need to do something once the post is published? We might need to ping iTunes if a Podcast is published, charge a client if an order is placed etc..
We can schedule a job to be fired at the appropriate time. This is quite easy with something like Laravel’s queue system:
This doesn’t cause any state change in the database until the desired time. Once the time is reached, the job is executed, the event is fired and the system can react to the event at the appropriate time. We might need to do some extra work here to inform users of scheduled commands for a particular post.
Strategy 4: Verbose Events
We’re getting better and more flexible, and not too complex just yet. We could probably stop here, but what if we needed a report on upcoming posts. The read model keeps the Post in the draft state until the PostPublished event fires.
The answer is in providing more verbose events. The first event indicates that a post has been scheduled to be published (past, present or future it doesn’t matter) and another event to indicate that the post is published. This gives the system two hooks in the post lifecycle, very flexible indeed.
With the PublishPost command, we can immediately fire one event after the other which keeps the event listeners simple.
Bonus Strategy: Time is a Command
At this point, we’re in a pretty good state. We’ve got verbose events, giving us a lot of flexibility in what the system should do when posts are scheduled to be published and when the posts are actually published.
There’s just one small design flaw with what our final strategy. We’ve mixed our read models and write models together, we’re dependent on a read model to execute things in our read model. We’ve also got some duplicated business logic in the aggregate root and our toBePublished
query.
In a very real world scenario, what if we had thousands of scheduled posts which was causing a performance problem on our blog, we’d update our read projection to only store published posts to optimise the query. toBePublished
would now always return an empty result and no further posts would be published!
To keep our business logic in the write model, we can treat Time as a Command. The aggregate root, looks at the time, its current state and determines what events it needs to publish.
This does add a little complexity, and I don’t think it adds any more flexibility to the system. Personally I’m willing to trade some simplicity with a dependency, every thing is a trade off, But you already know that if you’re using Event Sourcing ;)
Bi-Temporal Models
As soon as we introduce an effective date into our events, we now have 2 time data points in our events — the effective date (when the event should take effect) and the recorded date (when the event was recorded).
For scheduling posts, the PostPublishScheduled
might an effective date of February and a recorded date of January. For back dating posts, it could have PostPublishScheduled
events in January and a recorded date of February.
This is venturing into the world of bi-temporal event sourcing, which is the concept of viewing the state on two time dimensions — what was published on 14 January vs. what did we think was published on 14 January.
Conclusion
Which strategy you choose entirely depends on your project, if you need maximum flexibility in your project then you might need a more complex strategy. Otherwise keep it simple as possible!