Home > Net >  Azure.Messaging.EventHubs EventProcessorClient - Writing checkpoint on shutdown/restart
Azure.Messaging.EventHubs EventProcessorClient - Writing checkpoint on shutdown/restart

Time:12-09

We are using the EventProcessorClient class to read messages from all partitions of an Event Hub.

To enhance the checkpoint writing, we hold a counter and perform the checkpointing after processing multiple messages in ProcessEventAsync instead of after each one.

When the service restarts or gets deployed, we noticed that we re-read previously processed messages due to the fact that the checkpoint was not updated right before stopping.

enter image description here

However, when using EventProcessorClient, the implementation gets wrapped and only the PartitionClosingAsync event gets called, which does not have the required details to update the checkpoint.

Is there a way to update the checkpoint to the latest one when shutting down while using EventProcessorClient?

CodePudding user response:

There are a couple of different ways to do this, with varying degrees of complexity. However, in most scenarios, it's unlikely that checkpointing on shutdown is going to have the effect that you would like it to. For the interested, context is way down below in "Load balancing details".

How-to: A simple approach

The most straightforward approach would be to hold on to the ProcessEventArgs for the last event processed by each partition in a class-level member to do so.

With this, you're still very likely to run into the dueling owners scenario described in the details.

How-to: More complex, less memory, same challenges

If you extend the EventProcessorClient, you could hold onto just the last processed offset for each partition and use it to call the protected UpdateCheckpointAsync method on the processor. This saves you some space, as you're holding onto just a long rather than the state associated with the event args.

With this, you're still very likely to run into the dueling owners scenario described in the details.

How-to: A lot more complexity, more efficient, and without overlaps

For this one to make sense, you'd have to have a stable number of processors - no dynamic scaling.

Checkpointing takes the same approach as bove - you'll extend the processor client and using the offset to call UpdateCheckpointAsync as described above.

To avoid ownership changes and rewinds, you'll assign the processor a set of static partitions and bypass load balancing. The approach is described in the in the Static partition assignment sample, which can be applied when extending the EventProcessorClient.

With this, you do not have the potential for overlapping owners. Since the partitions are statically assigned, you know this node will be the only one to process them.

The trade-off here is that you lose load balancing and dynamic assignment. If the node crashes or loses network connectivity, nobody will be processing the set of partitions that it owns. This generally works well in host environments where there's an orchestrator monitoring nodes and ensuring they're healthy.

Load balancing details

When partitions change owners, it not an ordered hand-off; processors do not coordinate to ensure that the old owner has stopped before the new owner begins reading. As a result, there is often a period of overlap where the old owner has a set of events in memory and won't be aware that a new owner has taken over until it tries to read the next batch.

During this time, both the old and new owners are processing events from the same partition. Both may be emitting checkpoints. If the old owner emits a checkpoint when the processing stops for that partition, there's a good chance it will rewind the checkpoint position to an earlier point than the current owner has written. That causes a bigger rewind if ownership changes again before the new owner emits a checkpoint.

We generally recommend that you expect to rewind by one checkpoint any time you're scaling the number of processors or deploying/rebooting nodes. This is something to take into account when devising your checkpoint strategy.

  • Related