Home > other >  Azure Eventhub Consumer
Azure Eventhub Consumer

Time:06-18

Why do we need a blob container on Azure storage account for an Eventhub consumer client(I'm using python). Why can't we consume the messages from the Eventhub(topics in Kafka terminology) directly like we do in Kafka or can it be done in any other way?

I'm following the official Azure documentation linked below: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-python-get-started-send

CodePudding user response:

You are consuming the messages directly from the event hub. The storage account is not in any way used as an intermediate step or something like that. Instead, the storage account is used for checkpointing:

Checkpointing is a process by which readers mark or commit their position within a partition event sequence. Checkpointing is the responsibility of the consumer and occurs on a per-partition basis within a consumer group. This responsibility means that for each consumer group, each partition reader must keep track of its current position in the event stream, and can inform the service when it considers the data stream complete.

If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It's possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.

So summarized: the storage account is used to store information about the readers and their position within a partition.

You can write your own custom checkpoint storage implementation, see this question: Is there a way to store the azure Eventhub checkpoint to a remote bucket such as Google cloud bucket?

  • Related