Due to an duplicate events coming from an event-driven architecture, there are duplicate records being saved to the database.
This is causing an issue. I did add duplicate constraint to the table, but now I don't know how to handle the exception in case of saveAll.
One thought that comes to my mind is a recursive approach.. but since it is unpredictable as to how many duplicate events may occur, the no. of retries cannot be determined and obviously, I cannot use an infinite recursive loop.
CodePudding user response:
What you are looking for is exactly-once delivery guarantee. This approach requires an event to be acknowledge for both producer and consumer to make sure the event is not lost.
Some message brokers have this feature; for example, kafka supports message dedup for producers.
Since you are using RabbitMQ, I would look into making event consumption an idempotent operation - basically it should be callable many times without producing any non-desired results.
In your case, I would validate events as they are consumed - which is basically called "fix data on write" - when an event is consumed, you know how the record will look like - do a check before inserting.
For completeness - you could fix your data on read - by removing duplicates; but that's less desirable approach as having a correct data set is something easier to reason about.
Another option is to use distributed transaction - a record is marked as read from a queue only if it was saved to db. This works, but requires lots of complexity. Checking for duplicates on write seems like the best bet for you.
CodePudding user response:
Since your events are firstly saved to database, you can just make a unique index, then duplicate records cannot be inserted.
For Mysql, "insert ignore ...." can just ignore duplicates.
For Postgres "insert ... on conflict do nothing" can just ignore duplicates.