Home > Back-end >  Cosmos DB Change Feed Processor distribution of traffic to multiple instances in the same deployment
Cosmos DB Change Feed Processor distribution of traffic to multiple instances in the same deployment

Time:03-16

I am using the .NET SDK V3 (3.26.0) for Cosmos DB.

According to the documentation, Cosmos DB Feed Change Processor should distribute traffic in parallel to multiple instances in the same deployment unit (same processor name and leasing container).

I have been trying to do that, running three (and more) instances, but only one instance receives calls, the others remain idle.

When the receiving instance is stopped, I would expect that another instance picks up almost immediately, instead several seconds (30-60) pass before another instance begins receiving messages

Questions:

  1. How can we ensure that the feed change processor distributes calls to multiple active instances in the same deployment unit?
  2. How can we ensure that the feed change processor switches over to other active instances quickly when an instance stops?
  3. What happens if an instance crashes without calling StopAsync?

CodePudding user response:

That documentation says that:

the change feed processor will, using an equal distribution algorithm, distribute all the leases in the lease container across all running instances of that deployment unit and parallelize compute

How many leases are in your lease collection? Leases are documents that contain an "Owner" property and start with and id that matches your processorName.

The time it takes for running instances to detect a crash or sudden stop of another instance is governed by the acquireInterval (https://docs.microsoft.com/dotnet/api/microsoft.azure.cosmos.changefeedprocessorbuilder.withleaseconfiguration?view=azure-dotnet#microsoft-azure-cosmos-changefeedprocessorbuilder-withleaseconfiguration(system-nullable((system-timespan))-system-nullable((system-timespan))-system-nullable((system-timespan)))) which defaults to 17 seconds (how long it takes for any instance to scan for potential leases that have been dropped) and the expirationInterval which defaults to 60 seconds (how long does it take for a lease is considered expired/with no owner).

  1. As explained, the distribution is of the leases. If you have 1 lease, because the monitor collection has a single partition, then that is your current max. As the collection grows, more leases will dynamically appear.
  2. That already happens as you describe, the time it takes is governed by the configuration mentioned before.
  3. The leases owned by the instance will eventually be considered expired and another instance will grab them.
  • Related