I read rows from a file (let's call it customers
) and publish them to an Event Hub. Each row contains a customer_id
and some data. The same customer may appear several times in the file. So I want to use customer_id
as the partition key to guarantee that ordering is preserved for the same customer, while still sending messages in batches for performance reasons. Seems easy enough...
So, you can set partition on messages or batches using
var opts = new SendEventOptions() {
PartitionKey = customer.CustomerId
};
client.SendAsync(messages, opts);
But this will set the partition key on the batch itself and that will make all messages in the batch having the same key
Is it even possible to set partition key on each message and still use batches in a sane way? Preferably I would like to set the key on each message and then just add them to a batch and send it.
I'm using the Azure.Messaging.EventHubs
namespace and C#.
CodePudding user response:
Unfortunately, what you're looking to do is not possible. The Event Hubs service requires that all events in a batch be assigned the same partition key.
Your best bet using partition keys would be to build a batch for each partition key that you're using and add to them as you process, sending when full or some flush threshold has elapsed.
Alternatively, you could assign each customer identifier to a specific partition and then build a batch per partition with the approach described above. This would be more efficient, as you'd have a smaller number of batches to manage and would likely be filling each batch more quickly.