TLDR: I have a HTTP server application written in C which launches some scripts using popen()
. The scripts start a few daemons: wpa_supplicant and udhcpd. Those daemons seem to hold onto my HTTP server port after my server stops. Why?
During initialization, my HTTP server application uses popen() to launch a script to start wpa_supplicant and udhcpd to make sure my interfaces are ready to go. After the scripts execute, my application opens port 80 as you would expect.
The problem: When my application closes and goes through all the destructors, it correctly closes the socket with close(int_socket_val)
, yet trying to start my application a second time will fail because port 80 is not available.
Doing a netstat -tulpn
shows that either wpa_supplicant or udhcpd is hanging onto my port 80. Interestingly, while my HTTP server is still running, netstat shows this same result - so my HTTP server is never listed as owning the port. Killing those applications with killall -9 wpa_supplicant udhcpd
will free port 80 and allow me to start my HTTP server again. But why does this happen? This has proven a difficult problem to research.
For reference, here is the method I use to launch scripts and be able to read what was returned during those calls:
std::string ConnectionManager::exec(const std::string& command, bool strip)
{
char buffer[EXEC_BUFFER_LEN];
std::string result = "";
// Open pipe to file
FILE* pipe = popen(command.c_str(), "r");
if (!pipe)
{
std::cout << "ERROR: ConnectionManager::exec() - failed to open command: " << command << std::endl;
return result;
}
// read till end of process:
while (!feof(pipe))
{
// use buffer to read and add to result
if (fgets(buffer, EXEC_BUFFER_LEN, pipe) != NULL)
{
result = buffer;
}
}
pclose(pipe);
if ( strip )
{
removeLineEndings(result);
}
return result;
}
This is not a special case where port 80 is somehow magical. It works on any port I use for development - starting my HTTP server on port 50000 poses the same effect. Here is the netstat output for reference:
root@device:/usr/bin# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
.........
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 4267/udhcpd
.........
root@device:/usr/bin#
During a subsequent run, I might get wpa_supplicant hanging onto the port - that part seems random:
root@device:/usr/bin# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
.........
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 4393/wpa_supplicant
.........
root@device:/usr/bin#
For reference, here is a section of the script that calls these two daemons:
wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf
udhcpc -i wlan0
CodePudding user response:
@G. Sleipen - provided an accurate explanation of the problem. What did the trick for me was to add the flag that was suggested in addition to explicitly setting the FD_CLOEXEC flag in a subsequent system call. This may not be ideal for everyone because that second call is not atomic in the way that the SOCK_CLOEXEC flag should have been, but it provides a fallback in cases where your kernel might not support the SOCK_CLOEXEC flag. I'd be interested in an explanation why it did NOT work, but here's my solution:
int Socket::openServerSocket(uint16_t port)
{
int hSocket;
int flag;
/* Create the TCP socket */
if ((hSocket = socket(PF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP)) < 0)
{
return -1;
}
fcntl(hSocket, F_SETFD, fcntl(hSocket, F_GETFD) | FD_CLOEXEC);
/* Disable the Nagle (TCP No Delay) algorithm */
flag = 1;
if (-1 == setsockopt(hSocket, IPPROTO_TCP, TCP_NODELAY, (char *)&flag, sizeof(flag)))
{
return -1;
}
/* Set the Keep Alive property */
flag = 1;
if (-1 == setsockopt(hSocket, SOL_SOCKET, SO_KEEPALIVE, (char *)&flag, sizeof(flag)))
{
return -1;
}
/* Allow the re-use of port numbers to avoid error */
flag = 1;
if (-1 == setsockopt(hSocket, SOL_SOCKET, SO_REUSEADDR, (char *)&flag, sizeof(flag)))
{
return -1;
}
/* Set an explicit socket timeout value */
struct timeval tv;
tv.tv_sec = TIMEOUT_SEC;
tv.tv_usec = 0;
if (-1 == setsockopt(hSocket, SOL_SOCKET, SO_RCVTIMEO, (const char*)&tv, sizeof tv))
{
printf("ERROR: Socket::openServerSocket->setsockopt(port timeout)\n");
return -1;
}
/* Construct the server sockaddr_in structure */
memset(&m_sockaddr, 0, sizeof(m_sockaddr)); /* Clear struct */
m_sockaddr.sin_family = AF_INET; /* Internet/IP */
m_sockaddr.sin_addr.s_addr = htonl(INADDR_ANY); /* Incoming addr */
m_sockaddr.sin_port = htons(port); /* server port */
/* Bind the server socket */
if (bind(hSocket, (struct sockaddr *)&m_sockaddr,
sizeof(m_sockaddr)) < 0)
{
return -1;
}
/* Listen on the server socket */
if (listen(hSocket, MAXPENDING) < 0)
{
return -1;
}
return hSocket;
}
int Socket::acceptClient(int hSocket)
{
unsigned int sockaddr_len = sizeof(m_sockaddr);
int ret = accept4(hSocket, (struct sockaddr *)&m_sockaddr, &sockaddr_len, SOCK_CLOEXEC);
fcntl(ret, F_SETFD, fcntl(ret, F_GETFD) | FD_CLOEXEC);
return ret;
}
CodePudding user response:
popen()
forks the process and executes a shell in the child. The child inherits the filedescriptors of the parent. When the shell executes udhcpd, that in turn will cause a fork and exec. Then udhcpd will daemonize itself, but looking at the source code it looks like it won't close all open filedescriptors first. This means udhcpd continues to hold on to a filedescriptor for a socket your program opened earlier, and thus keeps it alive.
There are several workarounds. The easiest would be to ensure any filedescriptors you open in your program have the CLOEXEC
flag set. For example, create your listening socket with:
int listen_fd = socket(AF_SOMETHING, SOCK_STREAM | SOCK_CLOEXEC, 0);
This ensures that if you call popen()
, the child process won't inherit the filedescriptors with that flag set.
If your program causes udhcpd to start, it would also be prudent to make it stop udhcpd before it terminates. This would also have avoided the issue. Probably a combination of both would be best though.