Home > OS >  Unable to retrieve response headers from file containing URLs using simple python script
Unable to retrieve response headers from file containing URLs using simple python script

Time:12-04

I'm trying to use the following code to grab URLs from a file, then print the response headers using the script below:

import requests

file = open('urls.txt','r')

for url in file:
    print(url)  
    r = requests.head(url)
    print(r.headers["Server"])

I keep getting this error message:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 156, in _new_conn
    conn = connection.create_connection(
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 61, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

Can you all please assist? Thanks!

CodePudding user response:

End of line character:

Your file urls.txt has end of line characters in every line. Something like this

www.google.com\n
https://stackoverflow.com/\n
https://github.com/\n
https://www.google.com/

When you read the file, the character \n gets read as well which is causing issue with requests.head(url).

Removing the end of line character:

This is an easy fix. To remove end of line characters you can use python string method .strip() which removes the newline characters as well as leading\trailing whitespace.

Another option is to use the splitlines(). Which will handle EOL character for you. An example code is

temp = open(filename,'r').read().splitlines()

Sidenode: Always use with clause while reading/writing file as it automatically handles the closing of file for you once you have left the scope.

import requests

with open("urls.txt", "r") as url_file:
    temp = url_file.read().splitlines()
    for url in temp:
        print(url)
        r = requests.head(url)
        print(r.headers["Server"])
  • Related