How can I ensure all possible data received over TCP/UDP or decide when to discard incomplete?-CodePudding

To make things simple, let's say I work with plain ASCII data and the data I want to receive looks like:
<command|arg1|arg2|argN...>
where <> act as simple header/footer separators.
Because of the nature of network data, I know can receive something like:
<command|arg1|arg2|argN...>%^@#&%$^#%@<commands|arg1|ar
which is not not something easy to parse, but for different reasons I have in mind. Yes, I can just save leftover buffer for the next data processing, but there are some questions I can't easily figure answer for:

How much UDP vs TCP affects reliability of the data? TCP may not be guaranteed to arrive for the same reasons why UDP data were dropped, like client just lost ISP connection, whatever. So what difference it makes outside of nightmare like completely restarting TCP listener on server from scratch every time one client decided to unexpectedly closed connection because now TPC listener just throws and is in broken state? Is there a guarantee with TCP that if client has sent an under X bytes message, where X bytes is the buffer size on the underlying socket, then I always either receive full data or get an error?
In the light of #1, say, I received <command|arg1|arg2|a and now I have to wait for the rest of the data. I poll TCP/UDP listener buffer every 1 second. On the next poll tick, I still don't receive enough data to complete query or may be receive none at all. What now? How do I proceed here? How do I decide when and what to drop? Will the data ever arrive? Do I just give up?
Imagine I received incomplete data from Client1 and now, after I've done first reading pass, I return after 1 second of sleep but this time a different Client2 was accepted. So now if I read data to saved buffer I have mish-mash of incoherent data. Do I now have to allocate buffer for each client and keep track of them? Is this kind of approach typical and unavoidable in multi-client TCP server? How would I even distinguish unique clients, by what criteria? Just IP:port? What if port differs next send cycle?

My current setup for context's posterity:

On UDP/TCP listener I have set up a large enough buffer to always accommodate largest possible command request it can be (512b). My understanding that in theory, this means if one request was send I always can receive it in full without any overflow, the only reason I might need to postpone/save data if multiple data from the same IP were received during listener sleep time (1s)
I only parse data fully from client that was authorized, the rest of the data is looked into for auth command or dropped. Auth command itself is incredibly short to ensure it always arrive in one go so the data parsing code for that is simpler and never preserves past buffer.

CodePudding user response：

TCP provides an unstructured byte stream, i.e. there are no implicit boundaries in the data caused by how they were sent. UDP instead provides messages, i.e. each send generates a new message which gets matched by exactly one recv in the recipient.

Because of the nature of network data, I know can receive something like: <command|arg1|arg2|argN...>%^@#&%$^#%@<commands|arg1|ar

TCP guarantees that the data in the byte stream are in the same order as send by the peer and no data got lost or got duplicated inside the byte stream. There will also be no junk data inside the byte stream, i.e. it will only contain what was actually sent by the peer, contrary what you seem to imply.

UDP does not provide such strong guarantees. Messages might be lost, duplicated or reordered - but only as the full message. Junk data do not magically happen either.

Is there a guarantee with TCP that if client has sent an under X bytes message, where X bytes is the buffer size on the underlying socket, then I always either receive full data or get an error?

TCP does not provide any information if data are still missing at the end of the byte stream. Any kind of interpreting the byte stream as a sequence of messages must be implemented by the application, TCP does not provide it. Since TCP has no idea of a message semantic it can also not provide errors at the granularity of a message.

UDP either only receives the full message or does not receive the message. It does not provide any information if messages got lost.

Imagine I received incomplete data from Client1 and now, after I've done first reading pass, I return after 1 second of sleep but this time a different Client2 was accepted. So now if I read data to saved buffer I have mish-mash of incoherent data

Each TCP client gets its own byte stream with its own socket. You get only a mish-mash of incoherent data if you read from different sockets into the same buffer, so use a separate buffer for each socket.

With UDP you always get messages and each message comes only from a single peer. recvfrom provides the information, where this specific message came from, even if you use an unconnected single UDP socket to receive messages. With a connected socket it will only receive the messages from the connected peer.

How would I even distinguish unique clients, by what criteria? Just IP:port? What if port differs next send cycle?

IP:port is all what you get from accept or recvfrom. If you need something more like the semantics of different (authenticated) users, then you need to implement it inside your application.

Auth command itself is incredibly short to ensure it always arrive in one go

TCP does not provide such guarantees. And UDP does not provide any guarantee that the auth message is received at all or before any others.