I have recently started working on Google PubSub, and using the same with Push subscription to transfer data between cloud run instances.
During the testing I noticed that in few cases there was delay between the Publish and the Subscription. So I directly used the REST API calls instead of sending it through PubSub.
Kindly help me to understand the below 2 items:
- Which is faster?
- Which is efficient?
Thank you,
KK
CodePudding user response:
Communicating directly between your Cloud Run instances vs. doing it through Cloud Pub/Sub likely has more implications than just which is faster. In the "good" case, where both your publisher and subscriber are up and running and not overloaded, communicating directly is likely going to be faster.
The reasons to use Pub/Sub are around two main points: discoverability and reliability. For discoverability, is is guaranteed that your publishing Cloud Run instance will always know the URL of the subscribing Cloud Run instance? Will it always be the case that the transfer of data is from one to one? Could you ever have multiple Cloud Run instances that would want to receive messages? If so, how do you expect to update the publisher to send messages to both? If you communicate directly, you'll likely have to issue individual requests to each target Cloud Run instance and wait for the response from both. If you use Cloud Pub/Sub, this is taken care of for you: your publishing Cloud Run instance need only send a message once to Cloud Pub/Sub and any interested Cloud Run instance would be registered as a subscription and receive all of the messages.
The other main reason to use Pub/Sub is for reliability. What does your publishing Cloud Run instance do if the subscribing Cloud Run instance is down or overloaded? Is it going to buffer the messages? Write them to persistent storage? How does it manage that buffer or storage and ultimately redeliver messages? What if the Cloud Run instance restarts during this time? With Cloud Pub/Sub, you generally need not worry about any of these considerations because the service is designed to be highly available and buffer messages quickly when needed without affecting the publisher's performance.
So if speed is your only concern and your requests from one Cloud Run instance to the other are always going to be one-to-one, you will always know the address of the target Cloud Run instance, and you are okay without implementing more complicated buffering (basically, guaranteeing at-most-once delivery), then direct calls are probably going to be okay.
But if any of these considerations need to be taken into account, then Cloud Pub/Sub is going to be a much better choice. It will potentially be slower by virtue of the fact that it is hopping through multiple steps. There are probably some things you can do to make sure that the latency is minimized. Two common ones are:
- Make sure you only instantiate the publisher client once and reuse it rather than recreating the client for every publish.
- In your publisher batch settings, set maxMessages to 1 so every message is sent as soon as it is received via a call to
publish
. If your throughput of messages is relatively low, this will be helpful. If your throughput is high, then the key is to make sure you don't wait for the result of the publish synchronously, especially if you are publishing messages in a loop. By waiting asynchronously, you'll be able to batch more messages together and therefore send them more efficiently.
So to the efficient question, there isn't a single answer. It depends a lot on the use case and desired behavior. But in all likelihood, from the perspective of efficiency in terms of amount of work you will have to do to get reliable delivery, Pub/Sub is the better choice.