Home > Software design >  Deciding between TCP connection V/s web socket
Deciding between TCP connection V/s web socket

Time:12-21

We are developing a browser extension which would send all the URLs visited by a logged in user to backend APIs to be persisted.

Now as number of requests send to backend API would be huge and hence we are confused between if we create a persistent connection via websocket OR do it via TCP connection i.e. using HTTP rest API calls.

The data post to backend API doesn't need to be real time as we anyway would be using that data in our models which doesn't demand them to be real time.

We are inclined towards HTTP rest API calls as due to below reasons

  • Easy to implement
  • Easy to scale(using auto-scaling techniques)
  • Everyone in the team is already comfortable with the rest APIs

But at the same time cons would be

  • On the scale where we would have a lot of post requests going to server not sure it would be optimised
  • Feels like websockets can give us an optimised infrastructure :(

I would love if I can hear from community if we can have any pitfalls going with rest API calls option.

CodePudding user response:

So first of all TCP is the transport layer. It is not possible to use raw TCP, you have to create some protocol on top of it. You have to give meaning to the stream of data.

REST or HTTP or even WebSockets will never be as efficient as customly designed protocol on top of raw TCP (or even UDP). However the gain may not be as spectacular as one may think. I've actually done such transition once and we've experienced only few percent of performance gain. And it was neither easy to do correctly nor easy to maintain. Of course YMMV.

Why is that? Well, the reason is that HTTP is already quite highly optimized. First of all you have "keep alive" header that keeps the connection open if it is used. And so the default HTTP mechanisms already persists connection when used. Secondly HTTP handles body compression by default, and with HTTP/2 it also handles headers compression. With HTTP/3 you even have more efficient TLS usage.

Another thing is that since you do not require real time data then you can buffering. So you don't send data each time it is available, but you gather it for say few seconds, or minutes or maybe even hours, and send it all in one go. With such approach the difference between HTTP and custom protocol will be even less noticable.

All in all: I advice you start with the simplest solution there is, in your case it seems to be REST. Design your code so that transition to other protocol is as simple as possible. Optimize later if needed. Always measure.

Btw, there are lots of valid privacy and security concerns around your extension. For example I'm quite surprised that you didn't mention TLS at all. Which matters, not only because of security, but also because of performance: establishing TLS connections is not free (although once established, encryption does not affect performance much).

CodePudding user response:

Putting my discomfort aside (privacy, anyone?)...

Assuming your extension collates the Information, you might consider "pushing" to the server every time the browser starts / quits and then once again every hour or so (users hardly ever quite their browsers these days)... this would make REST much more logical.

If you aren't collating the information on the client side, you might prefer a WebSocket implementation that pushes data in real time.

However, whatever you decide, you would also want to decouple the API from the transmission layer.

This means that (ignoring authentication paradigms) the WebSockets and REST implementations would look largely the same and be routed to the same function that contains the actual business logic... a function you could also call from a script or from the terminal. The network layer details should be irrelevant as far as the API implementation is concerned.

As a last note: I would never knowingly install an extension that collects so much data on me. Especially since URLs often contain private information (used for REST API routing). Please reconsider if you want to take part in creating such a product... they cannot violate our privacy if we don't build the tools that make it possible.

  • Related