Home > OS >  Python socket - Couldn't read HTTP POST request body
Python socket - Couldn't read HTTP POST request body

Time:02-13

I'm trying to real a whole HTTP 1.1 POST request through a socket based HTTP proxy, and couldn't read the body of the request right after reading the headers.

The main code of handling the POST request handling is:

import socket
...
# request_buffer is initialized with the request's first line (with the method, uri and status).
request_buffer = http_status_line 
socket_file = client_socket.makefile()
raw_headers = recv_headers(socket_file)
socket_file.close()
request_buffer  = raw_headers
headers = dict_headers(raw_headers)
body_len = int(headers['Content-Length'][0])
print(repr(request_buffer))
raw_request = recv_body(client_socket, body_len)
request_buffer  = raw_request
print(repr(raw_request))
server_socket.send(request_buffer)

While the auxiliary functions' source code is:

def recv_headers(socket_file):
    raw_headers = ''
    while True:
        header = socket_file.readline()
        raw_headers  = header
        if len(header) == 2:     # if header == '\r\n'
            break
    return raw_headers

def recv_body(conn_socket, body_len):
    request_body = ''
    bytes_read = 0
    body_chunk = conn_socket.recv(body_len - bytes_read)
    while len(body_chunk) > 0:
        request_body  = body_chunk
        bytes_read  = len(body_chunk)
        body_chunk = conn_socket.recv(body_len - bytes_read)
    return request_body

Note: I've omitted the source code of dict_headers() because it's working well, and I wanted to minimize the amount of code for your convenience. In addition, I already made sure that body_len has the right value (the value in the Content-Length header).

Printed result:

enter image description here

As you can see raw_request (the request body) is not being printed at all.

The original POST request which being forwarded through the proxy:

enter image description here

The original request indeed has a body which is: log=test&pwd=test.

Any help will be more than appreciated

CodePudding user response:

socket_file = client_socket.makefile()
raw_headers = recv_headers(socket_file)
...
raw_request = recv_body(client_socket, body_len)

With makefile you are using buffered I/O inside recv_headers. The later recv inside recv_body instead works with the original socket object and thus unbuffered I/O. Mixing buffered and unbuffered I/O is a receipt for trouble.

The problem is that the initial buffered I/O might internally retrieve more data from the underlying socket than actually needed for reading the headers. These extra data are kept in the internal buffer of socket_file and are available for socket_file.read. They are not available for client_socket.recv anymore though since they are already retrieved from the underlying socket.

Thus, never mix buffered and unbuffered I/O. Once you've switched to buffered I/O stay with it.

  • Related