How to use a URL list (.txt) in python correctly?-CodePudding

I need to make a checker for availability of an unlimited number of sites, could you please tell me how to implement a list of URLs so the script reads the URLs from the .txt and loops each of them?

My approximate non-working code:

import requests
urls_list = open('G:\\urls_list.txt', 'r ')
for url in urls_list:
    response = requests.get(url)
    if response.status_code != 200:
        print('Not active'.format(url))

CodePudding user response：

It is because, when doing for url in urls_list: ..., the url string will contain a newline character \n at the end.

You need to use str.rstrip to remove them.

Also, you better use a context-manager when reading files (with open("urls.txt") as f), it handles closing the file for you.

import requests

with open("urls.txt") as f:
    for url in map(str.rstrip, f):

        print("-" * 34)
        print(f"{url = }")

        try:
            response = requests.get(url)
        except requests.ConnectionError as e:
            print(e)
            continue

        print(f"{response.status_code = }")

----------------------------------
url = 'https://github.com/'
response.status_code = 200
----------------------------------
url = 'https://githubb.com/'
HTTPSConnectionPool(host='githubb.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f26a6758f70>: Failed to establish a new connection: [Errno -2] Name or service not known'))
----------------------------------
url = 'https://www.gnu.org/software/bash/manual/bash.html'
response.status_code = 200
----------------------------------
url = 'https://www.gnu.org/software/bash/manual/bashh.html'
response.status_code = 404
----------------------------------
url = 'https://stackoverflow.com/'
response.status_code = 200
----------------------------------
url = 'https://stackoverfloww.com/'
response.status_code = 200