There are two important fields for controlling the concurrency level in Java GCP PubSub consumer:
- Parallel pull count
- Number of executor threads
From the official example:
setParallelPullCount
determines how many StreamingPull streams the subscriber will open to receive message. It defaults to 1.setExecutorProvider
configures an executor for the subscriber to process messages. Here, the subscriber is configured to open 2 streams for receiving messages, each stream creates a new executor with 4 threads to help process the message callbacks. In total 2x4=8 threads are used for message processing.
So parallel pull count, if I'm not mistaken, directly refers to the number of Java executors (=thread pools), and number of executor threads sets the amount of threads per each pool.
Normally I reason about separate thread pools as having different use cases or responsibilities, so we might for example have one unbounded cached thread pool for IO, a fixed thread pool for CPU-bound ops, a single (or low number) threaded pool for async IO notifications, and so on.
But what would be the benefit of having two or more thread pools with identical properties for consuming and processing pubsub messages, compared to simply having a single thread pool with maximum desired number of threads? For example, if I can spare a total of 8 threads on the subscriber, what would be the concrete reason for using 1x8 vs 2x4 combination? (a single pool of 8 threads, versus pull count=2 using 4 threads each)?
CodePudding user response:
The setParallelPullCount
option doesn't just refer to the number of Java Executor
s, it refers to the number of streams created that request messages from the server. The different streams could potentially return a different number of messages due to a variety of factors. One may want to increase parallel pull count in order to process more messages in a single client than can be transmitted on a single stream (10MB/s). This is independent of the choice of whether or not to share executors/thread pools.
Whether or not to share a thread pool across the streams would be handled by calling setExecutorProvider
. If you set an ExecutorProvider
that returns the same Executor
on each call to getExecutor
, then the streams share it. If you have it return a new Executor
for each call, then they each have their own dedicated Executor
. The default ExecutorProvider
does the latter.
If one calls setParallelPullCount(X)
, then setExecutor
gets called X
times to get an Executor
for each stream. The choice between a shared one across all of them or individual ones for each probably doesn't change much the vast majority of the time. If you are trying to keep the number of overall threads relatively low, then sharing a single Executor
may be helpful in doing that.
The choice between X Executor
s with Y threads and one Executor
with X*Y threads really comes down to the capability to share such resources if the amount of data coming from each stream is vastly different, which probably isn't going to be the case most of the time. If it is, then a shared Executor
means that a particularly saturated stream could "borrow" threads from an unsaturated one. On the other hand, using individual Executor
s could mean that in such a scenario, messages on the stream with fewer messages are as able to get through as messages on the saturated stream.