I've got a program with a Heisenbug I'm trying to diagnose. Using a combination of gdb and Ghidra, I've been able to track down the crash to a particular section. Here's the gist of my code:
FD_ZERO(&readfds);
FD_SET(sock1, &readfds);
max_fd = sock1;
if ( some_condition ) {
FD_SET(sock2, &readfds);
if ( sock2 > max_fd ) {
max_fd = sock2;
}
}
if ( select(max_fd 1, &readfds, NULL, NULL, &timer) == -1 ) {
goto error;
}
if ( FD_ISSET(sock1, &readfds) ) {
...
}
if ( FD_ISSET(sock2, &readfds) ) {
...
}
I've been able to narrow down the crash to the expansion of that last FD_ISSET
macro. Specifically, it calls __fdelt_chk
which ultimately leads to my shell reporting
*** buffer overflow detected ***: terminated
However, if I change the code to
bool using_sock2 = false;
...
if ( some_condition ) {
using_sock2 = true;
...
}
...
if ( using_sock2 && FD_ISSET(sock2, &readfds) ) {
...
}
the problem goes away.
Clearly, I've invoked some kind of undefined behavior. However, I looked at the man page and I didn't see any warnings/requirements that seemed relevant. What exactly is causing this crash?
EDIT: Running the program under gdb or valgrind makes the error go away. The only way I've been unable to locate the source of the crash is by running the program normally and then attaching with gdb from another terminal.
CodePudding user response:
One thing to be careful with fd_set/FD_SET/FD_ISSET is that these sets are fixed size -- there's only enough room for FD_SETSIZE file decriptors in the fd_set. On Linux (you don't say what OS you are using) FD_SETSIZE is 1024, which matches the default ulimit of 1024 file descriptors, so you won't see problems UNLESS you've gone and raised the ulimit for your process (1024 is just a soft limit -- the hard limit is actually much larger).
If this might be the case, you should always check to make sure fd < FD_SETSIZE
before calling FD_SET. Something like:
FD_ZERO(&readfds);
if (sock1 >= FD_SETSIZE) {
error("too many file descriptors!");
abort(); }
FD_SET(sock1, &readfds);
max_fd = sock1;
if ( some_condition ) {
if (sock2 >= FD_SETSIZE) {
error("too many file descriptors!");
abort(); }
FD_SET(sock2, &readfds);
if ( sock2 > max_fd ) {
max_fd = sock2;
}
}
You might also want to make sure that none of your file descriptors are some other invalid value (such as -1 that might come from an error in some earlier system call) as that would likewise cause an out-of-bounds access in the fd_set if you tried to use it with FD_SET or FD_ISSET