I have a set of multithreaded executables written in C being run from the crontab on an Ubuntu machine that primarily fetch and process data from websocket connections. Each of these executables are run in a while loop, such that if an executable terminates, it is run again immediately.
Whenever I run these executables, they tend to run fine for several hours, but then will unexpectedly terminate (all at the same time), at which point the aforementioned while loop causes them to repeatedly start, run for a few seconds, and then terminate unexpectedly, repeating this cycle ad infinitum.
There are no core files generated (even though I have set "ulimit -c unlimited" and built the executables with "-g -ggdb", so they do generate a core file upon segfault). Also, "dmesg" does not show anything indicating this repeated termination/restarting of the executables, and in fact none of the logs in /var/log seem to show anything of note, so I assume they were not killed due to OOM as per my initial guess. There is also plenty of disk space.
How can I debug an issue like this? Is there anywhere else I can look for error messages?
I forgot to mention that there is nothing of note being printed to stdout/stderr either. Also, another weird thing is that if I kill the script containing the while loop corresponding to one of the executables (without touching any of the other while loops) and then run that script manually on the terminal, the corresponding executable seems to run fine without termination, even as the other executables are still continually restarting and terminating instantly.
I believe I've narrowed it down to something related to stdout. When I log the websocket output to stdout, the continual restarting and termination happens. When I remove that logging, the executable doesn't crash anymore.
Oh so when the executable prints the websocket output to stdout, it pipes this output to "taskset -c 0 gzip -c", and apparently those gzips terminated for some reason and I didn't even notice. Any ideas why that might be or how to debug that?
CodePudding user response:
Maybe you can try to get stderr output of the main while loop to see something that would be hard printed to the console but not logged.
If it is a shell script, append >output.log 2>&1
at the end of the linux command.
If not, you can tail /proc/<pid>/fd/1
where <pid>
is the linux process id
CodePudding user response:
Bad websocket inputs is crashing your server?
One possible solution u may want to consider a ids like bro(now called zeek https://zeek.org/) to capture all your traffic from your server( u will need to clone traffic on the switch and have record accurate timestamps of your crashes.
Once you have recorded the crash activity consider playing it back by exporting and crafting the packet(you could copy the hex and attach to client socket and just compile). If it crashes again you have a repro to debug.