I have a problem with a TCP socket getting "broken pipe" after a couple of hours of disuse. The server is sending packets every 300ms to the client but the client sends packets back on the socket very rarely sometimes days apart. The server to client part of the socket is still alive when the broken pipe is noticed after two consecutive packets from client to the server (this is expected behaviour when one half of the socket is closed). Neither server nor client notice the socket is closed until the client sends data.
Update: This is not entirely correct, the client spawns two sockets and one is not emptied leading to the problem - see answer.
How do I go about debugging this issue?
Does anyone have any idea why this would happen?
Some backstory: I re-implemented a socket server in Rust using std::net, the old implementation was Python3. The server works exactly the same, the clients are written in Python3. The server and clients both run locally on a fedora 27 Linux x86-64 machine. This problem is not present when running the python server, which should eliminate the operating system or hardware as the cause - right?
CodePudding user response:
I found the issue, thanks to the busybees suggestion. I used Wireshark and found that after a couple of hours the ACKs responded with ZeroWindow meaning the receiving client socket buffer was full.
This lead to me digging into the client where I found the issue: The client spawned one reader socket and one writer socket (I did not know this was the case) and the writer socket never got its buffer cleared and therefore choked after some hours. The issue is now solved!
However - This means that a Rust socket where one direction is choked (the tx buffer on the server side is probably filled too) will crash if a packet is coming the other direction! This is not the behaviour of a Python socket...