I'm looking at this example of making a simple HTTP request in Python using only the built in socket
module:
import socket
target_host = "www.google.com"
target_port = 80
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect((target_host, target_port))
client.send(b"GET / HTTP/1.1\r\nHost: google.com\r\n\r\n")
response = client.recv(4096)
client.close()
print(response)
When I run this code, I get back a 301
:
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
I'm confused by this because the "new location" looks identical to the URL I requested. Using curl
or wget
on the same URL (www.google.com
) returns a 200
. I'm having a hard time understanding what is different. Are curl
/wget
getting the same 301
"behind the scenes" and just automatically requesting the 'new' resource? And if so how is that possible given that, as mentioned above, the 'new' location appears identical to the original?
CodePudding user response:
I'm confused by this because the "new location" looks identical to the URL I requested
It doesn't. Your host header says that you are accessing google.com
, i.e. without www
:
client.send(b"GET / HTTP/1.1\r\nHost: google.com\r\n\r\n")
This gets redirected to www.google.com
, i.e. with www
:
<A HREF="http://www.google.com/">here</A>.