I was trying to run the following sample code in Python 3.8:
import socket
mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com',80))
# since I need to send bytes and not a str. I add a 'b' literal also tried with encode()
mysock.send(b'GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
if (len(data)) < 1:
break
print(data)
mysock.close()
but this is throwing 404 Not found error
b'HTTP/1.1 404 Not Found\r\nServer: nginx\r\nDate: Tue, 02 Nov 2021 04:38:35 GMT\r\nContent-Type: text/html\r\nContent-Length: 146\r\nConnection: close\r\n\r\n<html>\r\n<head><title>404 Not Found</title></head>\r\n<body>\r\n<center><h1>404 Not Found</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n'
same I tried with urllib, It is working fine
import urllib.request
output= urllib.request.urlopen('http://www.py4inf.com/code/romeo.txt')
for line in output:
print(line.strip())
Does anybody know how to fix this? help me to figure out where am I going wrong in first block code.. Thanks in Advance!
CodePudding user response:
mysock.send(b'GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')
This is not a valid HTTP request for several reasons:
- it should contain the absolute path, not the absolute URL
- it should contain a Host header. Even though this is not strictly required for HTTP/1.0 (but for HTTP/1.1) it is usually expected anyway
- the line endings must be
\r\n
not\n
The following works:
mysock.send(b'GET /code/romeo.txt HTTP/1.0\r\nHost: www.py4inf.com\r\n\r\n')
In general: while HTTP might look simple it is not. Don't just assume how things are working by looking at a bit of traffic, but follow the actual standard. Even if it seems to be working initially for specific servers it might break later.