I'm trying to use the following code to grab URLs from a file, then print the response headers using the script below:
import requests
file = open('urls.txt','r')
for url in file:
print(url)
r = requests.head(url)
print(r.headers["Server"])
I keep getting this error message:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 156, in _new_conn
conn = connection.create_connection(
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Can you all please assist? Thanks!
CodePudding user response:
End of line character:
Your file urls.txt
has end of line characters in every line. Something like this
www.google.com\n
https://stackoverflow.com/\n
https://github.com/\n
https://www.google.com/
When you read the file, the character \n
gets read as well which is causing issue with requests.head(url)
.
Removing the end of line character:
This is an easy fix. To remove end of line characters you can use python string method .strip()
which removes the newline characters as well as leading\trailing whitespace.
Another option is to use the splitlines()
. Which will handle EOL character for you. An example code is
temp = open(filename,'r').read().splitlines()
Sidenode: Always use with
clause while reading/writing file as it automatically handles the closing of file for you once you have left the scope.
import requests
with open("urls.txt", "r") as url_file:
temp = url_file.read().splitlines()
for url in temp:
print(url)
r = requests.head(url)
print(r.headers["Server"])