Home > Software engineering >  Why can I only read http request from a TCPListener in fixed sized chunks?
Why can I only read http request from a TCPListener in fixed sized chunks?

Time:08-07

I try to figure out how to receive and parse http request over TCP in rust.

That's how my code looked originally:

fn main() {
    let mut tcp: TcpListener = TcpListener::bind("127.0.0.1:80").expect("couldnt bind tcp listener");
    println!("listening at {}", tcp.local_addr().expect("cant get local socket address").to_string());

    let mut heyhey: String = String::new();
    if let Ok((mut a, b)) = tcp.accept() {
        dbg!(
            unsafe { a.read_to_end(heyhey.as_mut_vec()) }
        );
        println!(">> {}", &heyhey);
    }

    println!("terminating");
}

But when running this code, and then navigating to localhost from my browser (msedge), accept stopped polling but the reading part seemed to be blocking the remaining code from being executed.

In my search for soliton i stumbled on the article from the rust book. The only significant difference I've noticed was the used of read() instead of read_to_string() or read_to_end().

This made me think that there is no EOF terminator at the end of http request. And that could be true as I found, since the standard mentions the first line using space to separate 3 parts, and using single crlf to separate headers and double crlf to separate data from headers, but never mentioned anything about compulsory EOF.

So then I got curious how should I go about reading and parsing a request.

After some searching I've came across the content-length header, but from my understanding its not compulsory. And not only that, but the number of headers can also vary very strongly - so, how should I go about parsing something like this? I don't want to do parsing in chunks, unless that's how its usually done, because that could very easily turn to be very messy.

So that basically left me with the following questions:

  • Is the absence of EOF indeed what causes read_to_string to never return?
  • Is Content-Length header mandatory, or are there cases where it's ommited. If so - how do I handle those cases?
  • Can I avoid having intermediary buffers when reading from the stream?
  • What's the conventional way to do it in rust?

CodePudding user response:

This made me think that there is no EOF terminator at the end of http request.

Correct, a HTTP request is defined by a HTTP request header and a (maybe empty) HTTP request body. The HTTP request header contains the necessary information to determine how long the HTTP request body is - see the standard for the details.

After some searching I've came across the content-length header, but from my understanding its not compulsory.

The size of the HTTP request body is determined by the method (i.e. HEAD has no body), the Content-length header or alternatively the Transfer-Encoding header (for chunked encoding)

Is the absence of EOF indeed what causes read_to_string to never return?

The lack of EOF causes read_to_string to never return, see Reading from a TcpStream with Read::read_to_string hangs until the connection is closed by the remote end.

Is Content-Length header mandatory, or are there cases where it's ommited. If so - how do I handle those cases?

Not mandatory, see above and standard.

What's the conventional way to do it in rust?

I'm not familiar how this is usually done in rust but I expect it to be no different from other languages like C:

  • Retrieve more data from the socket using read.
  • Check if the end of HTTP header is reached - if not read more data
  • If end of HTTP header is reached analyse request method, Transfer-Encoding and Content-length header to find out if there will be a HTTP body and how it should be read - see the standard for details
  • Read the HTTP body if there should be one. Note that parts of the HTTP header (or alternatively the next request) might be already contained in the already read data.

I don't want to do parsing in chunks, unless that's how its usually done, because that could very easily turn to be very messy.

Unfortunately that's how HTTP is - no fixed size header but instead variable sized and one has to figure out if it already ended and where exactly it ends.

  • Related