Why doesn't the readline while loop from HttpURLConnection block if HTTP protocol has no EOF ch-CodePudding

I'm trying to understand the logic of performing HTTP requests in Java using HttpURLConnection class (or HttpsURLConnection class). Here is my code to perform a GET request and print all response payload to stdout.

URL url = new URI("https://swapi.dev/api/planets/1/");
String line;

HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/json");

BufferedReader input = new BufferedReader(
    new InputStreamReader(connection.getInputStream())
);

// read all data from response payload & print to screen
while ((line=input.readLine()) != null) {
    System.out.println(line);
}
System.out.println("All data retrieved!");

Here's the problem: this piece of codes worked fine, successfully read all the info and finished running! My question is, why isn't the execution eventually blocked at the input.readLine() part in the while loop?

When I did similar while-loop reading on TCP level, such a readLine will block until connection is closed from client side. According to the doc (here) of HttpURLConnection class, the underlying TCP connection is usually not closed for performance reason. So if I imagine connection.getInputStream() same as socket.getInputStream() in the TCP analogue, I'd expect the input.readLine() call to eventually block, too.

I guess this method might be overridden somewhere in the case of HttpURLConnection （maybe to make use of Content-Length header property?) However, I can't find this overridden anywhere.

Since I'm trying to dive as deep as possible, it would be great if you could help locate exactly how the HttpURLConnection class causes my while-loop to eventually end.

CodePudding user response：

Why isn't the execution eventually blocked at the input.readLine() part in the while loop?

Because the Reader.readLine() call will return null when it reaches the end-of-stream on the input stream it is reading from.

It will reach the end-of-stream on the client side when the server side finishes sending the response data ... and the client has read it all.

The end-of-stream may or may not actually correspond to the server closing the TCP/IP connection. It won't if HTTP chunked transfer encoding is being used for the HTTP connection. Chunked encoding is used when the client and server agree to use a single connection for multiple request / responses. The chunk headers will tell the client side when it has reached the end of a document ... and that it should signal end-of-stream on the input stream.

The content-length header may or may not come into this¹. But bear in mind that the server can send a response without a content-length header at all ... meaning that the client-side won't know ahead of time how much data to expect, and won't be able to use it for "framing" the response body.

For more details, refer to the RFCs that specify HTTP. (This is better than looking at the client-side Java source code. The client side code only tells you half of the story.)

^{1 - The HTTP specs say that if the server includes a content-length in the response headers, it must send exactly that number of bytes. However, it doesn't mandate any specific client side behavior if the server sends less or more bytes than it said it would.}