I'm trying to real a whole HTTP 1.1
POST request through a socket based HTTP proxy, and couldn't read the body of the request right after reading the headers.
The main code of handling the POST request handling is:
import socket
...
# request_buffer is initialized with the request's first line (with the method, uri and status).
request_buffer = http_status_line
socket_file = client_socket.makefile()
raw_headers = recv_headers(socket_file)
socket_file.close()
request_buffer = raw_headers
headers = dict_headers(raw_headers)
body_len = int(headers['Content-Length'][0])
print(repr(request_buffer))
raw_request = recv_body(client_socket, body_len)
request_buffer = raw_request
print(repr(raw_request))
server_socket.send(request_buffer)
While the auxiliary functions' source code is:
def recv_headers(socket_file):
raw_headers = ''
while True:
header = socket_file.readline()
raw_headers = header
if len(header) == 2: # if header == '\r\n'
break
return raw_headers
def recv_body(conn_socket, body_len):
request_body = ''
bytes_read = 0
body_chunk = conn_socket.recv(body_len - bytes_read)
while len(body_chunk) > 0:
request_body = body_chunk
bytes_read = len(body_chunk)
body_chunk = conn_socket.recv(body_len - bytes_read)
return request_body
Note: I've omitted the source code of dict_headers()
because it's working well, and I wanted to minimize the amount of code for your convenience.
In addition, I already made sure that body_len
has the right value (the value in the Content-Length
header).
Printed result:
As you can see raw_request
(the request body) is not being printed at all.
The original POST request which being forwarded through the proxy:
The original request indeed has a body which is: log=test&pwd=test
.
Any help will be more than appreciated
CodePudding user response:
socket_file = client_socket.makefile()
raw_headers = recv_headers(socket_file)
...
raw_request = recv_body(client_socket, body_len)
With makefile
you are using buffered I/O inside recv_headers
. The later recv
inside recv_body
instead works with the original socket object and thus unbuffered I/O. Mixing buffered and unbuffered I/O is a receipt for trouble.
The problem is that the initial buffered I/O might internally retrieve more data from the underlying socket than actually needed for reading the headers. These extra data are kept in the internal buffer of socket_file
and are available for socket_file.read
. They are not available for client_socket.recv
anymore though since they are already retrieved from the underlying socket.
Thus, never mix buffered and unbuffered I/O. Once you've switched to buffered I/O stay with it.