Using a standard Java HTTP Client, I load a page at this address:
How to decode http response without errors?
CodePudding user response:
There are multiple items that can (and do not have to) define the response format:
- When the client sends a request, there is a header called
Accept-Encoding
. This should be a hint for the server what the client wants to have. - When the server sends the response, the server should mark up the encoding used in the header
Content-Encoding
. - The response body may contain meta tags as you mention. The disadvantage here is that the client already needs to assume some encoding to access this data so it is less reliable.
- Still then, it seems you apply the BodyHandler for
Windows-1251
regardless of what the really used encoding is.
With that your setup looks quite fragile, and you better check if the headers Accept-Encoding and Content-Encoding are present and meaningful.
Edit: When testing the code from the question I was able to reproduce the reported problems. But they vanished when I simply leveraged the default behaviour of the client like so:
public static void main(String[] args) throws Exception {
HttpRequest request = HttpRequest.newBuilder()
.uri(new URI("https://www.youtube.com/watch?v=ELArlE7gSmw"))
.GET()
.build();
HttpClient client = HttpClient.newHttpClient();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());
}
CodePudding user response:
Solved the problem.
For Intellij IDEA: File > Settings > Editor > File Encodings.
Set fields "Global Encoding" and "Project Encoding" to "System Default" (not UTF-8 or Windows-1251, but default!). The whole output is fixed