What to do when exception is thrown after state is modified?-CodePudding

In our system the user makes an order, which is a synchronous REST POST method call. The service then modifies the state of the system.

Now we are struggling with how to cleanup the state if the service modified it, but failed in the end, eg. due to system shutdown?

In a asynchronous approach it would be pretty straightforward - the message from the queue would not be processed so it would be retried.

However in a synchronous approach the client already got 500 error. He may never retry the action.

The only idea we have come up with is to have a background job doing the necessary cleanup (seems like implementing eventual consistency). What is the correct way to do that?

NOTE:

This might apply to any system, but in our case the "state modification" is actually a complex operation across multiple microservices using the saga pattern, which needs to be rolled back if something fails

CodePudding user response：

In eventual consistency either you send 202 accepted, or your try to await the processing of the request. Eventual consistency has events in its name and the most popular method for it is using domain events. Whenever you send a HTTP request, you add a domain event to the domain event queue, which is usually saved to the event storage. So you have a series of events there for example UserCreated, UserProfileUpdated, UserPasswordChanged, etc. This part is more or less synchronous, or at least the event queue saves the event to guarantee that it is not lost in the case of a power outage. You modify your databases based on these events. In the case of CQRS you have query databases, which are eventually consistent with the event storage. So the event storage is always up to date, and these query databases may have some delay, usually a few msecs or secs, but it depends on your business how long delay you allow or is acceptable for your consumers. It can depend on the type of the event and the type of the database, so there can be priority databases and events, which are important to process fast, and regular events, which are not that important. For example a password change should be almost immediate in my opinion, but a profile update can wait even a few secs. In the case of eventual consistency there is no rollback after the event is stored in the event storage. All you can do is compensating it with either latter events, something like UserCreationCancelled or UserCreationFailed or making some sort of exception in your business e.g. removing the partially created user manually from the database and don't automate these rare events. So the event storage describes the past and you cannot change the past after things happened. A rare and I think bad approach is restoring a previous point of your query databases and removing the event from the event storage and processing the other events again, but this is very complicated and if you don't think of everything e.g. the user was created and did something that affects other entities, you might end up with a broken timeline and broken databases.

CodePudding user response：

Now we are struggling with how to cleanup the state if the service modified it, but failed in the end, eg. due to system shutdown?

Without any further information, the answer would be pretty simple.
Your command handler (service) should be wrapped entirely in a transaction.
If the command has failed for a technical reason, then no transaction is committed.
Therefore, no state is changed.

If your service is involved in a saga, then the good practice is to save in database the saga's state each time it changes.
So that you can reload your saga with its last state as soon as the server restarts after a crash and get a consistent state.