Home > OS >  How to avoid duplicate data processing in EventhubTriggered Function app
How to avoid duplicate data processing in EventhubTriggered Function app

Time:11-19

I am trying to process large number of events(100,000) with 20 eventhub partitions and scalable function app set to 20 instances. The maxBatchSize in the function app is set to 100 and batchCheckpointFrequency is set to 1.

The function app is an EventhubTriggered Function App which processes batch of event data.

The function app is getting and processing duplicate messages(probably 5% of the total). I was wondering if there is any setting/ any changes I'll need to make to avoid the duplicate data processing?

Is there a way the function app would know if the data is duplicate in different batches of events?

CodePudding user response:

Azure Event Hub guarantees at-least-once delivery, see the docs. You should make sure your logic is idempotent. If you cannot do that you have include something like an unique id in your message and once you process a message, check it against a distributed store to know whether a message with that id is already processed or not.

An Azure Function does not have this de-duplicated logic out-of-the-box, that is for sure.

CodePudding user response:

The consumption plan will auto-scale depending on the incoming event hubs message traffic which may cause duplicate processing. I think you can try Premium or Dedicated plan and set the number of instances so that avoiding auto-scale.

If you want to stay with the Consumption plan, you can reduce batch size and the checkpointing interval. This should also help to reduce the number of duplicates.

  • Related