How to expand microservices? If Kafka is used-CodePudding

I have built a micro service platform based on kubernetes, but Kafka is used as MQ in the service. Now a very confusing question has arisen. Kubernetes is designed to facilitate the expansion of micro services. However, when the expansion exceeds the number of Kafka partitions, some micro services cannot be consumed. What should I do?

CodePudding user response：

This is a Kafka limitation and has nothing to do with your service scheduler.

Kafka consumer groups simply cannot scale beyond the partition count. So, if you have a single partitioned topic because you care about strict event ordering, then only one replica of your service can be active and consuming from the topic, and you'd need to handle failover in specific ways that is outside the scope of Kafka itself.

If your concern is the k8s autoscaler, then you can look into the KEDA autoscaler for Kafka services

CodePudding user response：

Kafka, as OneCricketeer notes, bounds the parallelism of consumption by the number of partitions.

If you couple processing with consumption, this limits the number of instances which will be performing work at any given time to the number of partitions to be consumed. Because the Kafka consumer group protocol includes support for reassigning partitions consumed by a crashed (or non-responsive...) consumer to a different consumer in the group, running more instances of the service than there are partitions at least allows for the other instances to be hot spares for fast failover.

It's possible to decouple processing from consumption. The broad outline of could be to have every instance of your service join the consumer group. Up to the number of instances consuming will actually consume from the topic. They can then make a load-balanced network request to another (or the same) instance based on the message they consume to do the processing. If you allow the consumer to have multiple requests in flight, this expands your scaling horizon to max-in-flight-requests * number-of-partitions.

If it happens that the messages in a partition don't need to be processed in order, simple round-robin load-balancing of the requests is sufficient.

Conversely, if it's the case that there are effectively multiple logical streams of messages multiplexed into a given partition (e.g. if messages are keyed by equipment ID; the second message for ID A needs to be processed after the first message, but could be processed in any order relative to messages from ID B), you can still do this, but it needs some care around ensuring ordering. Additionally, given the amount of throughput you should be able to get from a consumer of a single partition, needing to scale out to the point where you have more processing instances than partitions suggests that you'll want to investigate load-balancing approaches where if request B needs to be processed after request A (presumably because request A could affect the result of request B), A and B get routed to the same instance so they can leverage local in-memory state rather than do a read-from-db then write-to-db pas de deux.

This sort of architecture can be implemented in any language, though maintaining a reasonable level of availability and consistency is going to be difficult. There are frameworks and toolkits which can deliver a lot of this functionality: Akka (JVM), Akka.Net, and Protoactor all implement useful primitives in this area (disclaimer: I'm employed by Lightbend, which maintains and provides commercial support for one of those, though I'd have (and actually have) made the same recommendations prior to my employment there).

When consuming messages from Kafka in this style of architecture, you will definitely have to make the choice between at-most-once and at-least-once delivery guarantees and that will drive decisions around when you commit offsets. Note particularly that you need to be careful, if doing at-least-once, to not commit until every message up to that offset has been processed (or discarded), lest you end up with "at-least-zero-times", which isn't a useful guarantee. If doing at-least-once, you may also want to try for effectively-once: at-least-once with idempotent processing.