Home > Software design >  Notification System - dual writes problem
Notification System - dual writes problem

Time:04-07

Example scenario:

New user is created within a group (we need to ensure their email is unique etc.).

We'd like to send an event UserCreated (via PubSub/Kafka/RabbitMQ) in order to trigger some additional business logic asynchronously:

  • send confirmation email
  • notify group admin that a new user joined the group

I can imagine we can treat the confirmation email as fire-and-forget task as it can be triggered again by the user. However, that's not true for notifying the group admin (loosing such event can be unacceptable). We cannot simply save a new user to db and then publish an event as it can easily fail (dual write problem). We could move towards pure event-driven approach but then I have no idea how to provide a synchronous REST API for that it.

Question

How do people deal with dual write problem in real life when implementing notification/event system in their apps? Does everybody really use transactional outbox pattern with CDC (e.g. Debezium)? It seems like an overkill to me but I really can't think of a better way to tackle that problem (unless you can make your API calls fully asynchronous as well). Is polling a db table (instead of CDC) an acceptable solution? How could we scale that?

If you could share your experience or link some example projects as a reference that would be awesome! Most of the tutorials I was able to find seem to totally ignore the problem.

Just in case, I work mostly with Python (FastAPI) but it shouldn't be a big problem for me to analyse projects in other technologies (like Java/NodeJS).

CodePudding user response:

How do people deal with dual write problem in real life

If you've elected a distributed architecture, then you need to design your system to account for the messaging guarantees that are available.

Exactly once delivery guarantees are (take your pick) impossible / prohibitively expensive. So you get to choose between "At Most Once" delivery guarantees or "At Least Once" delivery guarantees.

At Least Once means that your subscribers need to be able to handle the case where they receive two (or more) copies of a message with the same semantics (either because they can detect the duplication, or because the cost of duplicate processing is acceptable).

CodePudding user response:

I can think of splitting the user creation two steps.

First, the synchronous API request to create the new user in the group is performed which immediately returns some kind of "task id" for this request. This just means, okay, we got your request to create a new user and will process it. The task id can be used to get information about the status of this request. If it is fire and forget from this point on the response to the client that the request to create the user was received at the other end might be sufficient and the task id (or request id) might only have system internal relevance, e.g. for correlation, logging and the actual processing in the background.

When this request is received at your backend you could for instance put a new command on a queue (like a create user command) or this could also be implemented as events (e.g. user creation requested Event). Note, by queueing I rather refer to the concept of queuing so this could be implemented differently, for instance, with transactional outbox or some persistent message queue solution.

If you consider having this command or event on a reliable queue (whichever implementation chosen) you could now try to react to this "message" asynchronously by actually creating the new user in the group. Once this happened you can publish some user created event.

The user created event can be subscribed by a single component or even separate components if that makes sense in your case to react to it by sending the confirmation email and by notifying the group admin. Splitting this into separate subscribers might add more implementation efforts but also gives you more flexibility in processing the same event with different performance and reliability requirements. For instance, as you mentioned, the email confirmation is not as crucial as informing your admin in your case.

The actual processing of the create user command (or user creation requested event respectively) and the user created event are then performed with the required degree of resiliency to deal with temporary outages and guaranteeing that everything is happening at some point which leaves you with the characteristics of eve tual consistency.

I've been following this pattern a couple of times already, especially when dealing with e-commerce for implementing ordering processes where clients (e.g. web or mobile front-ends) need immediate confirmation that their request has gone through synchronously but the actual processing of the same request and notification about the completion can happen later on asynchronously.

So you could consider the request to create the new user to be similar to placing an order, actually creating the new user similar to processing the order, sending the confirmation email similar to sending an order confirmation email to the customer and the notification of the admin similar to notifying some other important player in the system about the new order such as a logistics system that triggers the distribution of the ordered items.

  • Related