Small question regarding Reactor Kafka please.
May I ask if the number of partitions is a bottleneck to performance if the number of kafka consumer is (much) greater than the number of partitions please?
With a concrete example let's say I have a topic named the-topic with only three partitions.
Now, I have this below app in order to consume from the topic:
@Service
public class MyConsumer implements CommandLineRunner {
@Autowired
private KafkaReceiver<String, String> kafkaReceiver;
@Override
public void run(String... args) {
myConsumer().subscribe();
}
public Flux<String> myConsumer() {
return kafkaReceiver.receive()
.flatMap(oneMessage -> consume(oneMessage))
.doOnNext(abc -> System.out.println("successfully consumed {}={}" abc))
.doOnError(throwable -> System.out.println("something bad happened while consuming : {}" throwable.getMessage()));
}
private Mono<String> consume(ConsumerRecord<String, String> oneMessage) {
// this first line is a heavy in memory computation which transforms the incoming message to a data to be saved.
// it is very intensive computation, but has been tested NON BLOCKING by different tools, and takes 1 second :D
String transformedStringCPUIntensiveNonButNonBLocking = transformDataNonBlockingWithIntensiveOperation(oneMessage);
//then, just saved the correct transformed data into any REACTIVE repository :)
return myReactiveRepository.save(transformedStringCPUIntensiveNonButNonBLocking);
}
}
I dockerized the app and deploy in Kubernetes.
With cloud providers, I am able to easily deploy 60 of those containers, 60 of those apps.
And suppose for the sake of this question, each of my app are super resilient, never crashes.
Does it mean, since the topic has only three partitions, that at any time, 57 other consumers will be wasted?
How to benefit from scaling up the number of containers when the number of partitions is low please?
Thank you
CodePudding user response:
since the topic has only three partitions, that at any time, 57 other consumers will be wasted?
Yes. That's how Kafka consumer API works. The framework you use around it isn't relevant.
benefits from scaling up the number of containers when the number of partitions is low
You need to separate the event processing (saving to a reposistory) from the actual consumption / poll loop. For example, push transformed events onto a non-blocking, external queue / external API, without waiting for a response. Then setup an autoscaler on that API endpoint.
CodePudding user response:
Does it mean, since the topic has only three partitions, that at any time, 57 other consumers will be wasted?
Yes. For a single consumer group you can have as many concurrent consumers as the number of partitions.
How to benefit from scaling up the number of containers when the number of partitions is low please?
You might try to register each of those containers in different consumer groups of number of partitions. That will work. But as mentioned by @OneCricketeer you should have a separate event processing pipeline. That would be the best approach if you don't want to process the same event multiple times.