Home > OS >  Spring Kafka : Load Balancing between Consumers inside Kubernetes
Spring Kafka : Load Balancing between Consumers inside Kubernetes

Time:12-15

Small question regarding SpringBoot Kafka apps deployed in Kubernetes and how to Load Balance then please.

Background: I used to have a very simple SpringBoot web app, exposed over http, just doing a rather complex and lengthy computation.

@RestController
public class HelloHttp {

    @Autowired BusinessService businessService;

    @GetMapping("/business")
    public String veryComplicatedAndTimeConsumingBusinessLogic(@RequestBody String request) {
        return businessService.veryComplicatedAndTimeConsumingBusinessLogic(request);
    }

As this web app is getting popular, many more clients were using it, we decided to containerize it, and deploy it in Kubernetes (AWS). We created 5 replicas of it with a Kubernetes deployment / replica set. We then created a Kubernetes Service type Load Balancer, and like magic, we could see requests being load balanced between the 5 replicas. Meaning, one pod will process one request, then, another pod will process the next one, etc...

Some organization change happened, and instead of the clients sending the request http style, they all now put the payload inside Kafka. (The question is not about the legitimacy of this choice).

We then migrated this web app using Spring Kafka, to something like this:

public class HelloKafka {

    @Autowired BusinessService businessService;

    @KafkaListener(topics = "businessTopic")
    public void veryComplicatedAndTimeConsumingBusinessLogic(String message) {
        businessService.veryComplicatedAndTimeConsumingBusinessLogic(message);
    }

We just know the Kafka host, the topic (no information about consumer group?)

Still, like magic, on a single instance, we could see the application consuming the messages from Kafka and processing them.

As the load is still the same, we decided to deploy the same, again in Kubernetes, using deployment replica set service type Load Balancer.

However, very strange, we are not observing the load balancing mechanism at all.

May I ask what did I miss please?

Thank you

CodePudding user response:

no information about consumer group?

You define that on your own. groupId is a param to @KafkaListener.

we are not observing the load balancing mechanism at all

It's not a Kubernetes problem. Kafka consumers do not "distribute load". Instead, all consumers in the same consumer group (again, something you set) get assigned to individual partitions in the topic.

So, either

  • if you didn't set a groupId, it may be auto-generated by Spring, and each instance is reading all partitions
  • your topic has one partition (which is the default, if they were auto-created); therefore only one consumer at most in the group can read the topic
  • the upstream producers are only sending data to one partition, so only one consumer is reading it, and that is out of your control

legitimacy of this choice

Kafka is generally considered more highly available than an HTTP server with or without replicas. And it acts as a buffer to prevent DOS attacks against those endpoints. Plus, you probably don't need the overhead of HTTP vs plain TCP.

  • Related