tl;dr f-string is messing up the script below. List printed is empty despite the file containing a list of URLs. How can I fix this problem and have Python print out the URLs?
So I have a script below. It downloads a list of URLs, converts it into a list, and then prints it out. Now, for the variable link
, there's an f-string. If I keep just one value in the f-string (say I delete fromdate
and todate
and just keep username
), it works just fine. But if I put multiple values in the f-string, the script fails.
COMMAND
script.py -u mrbeast
SCRIPT
import argparse, re, requests
parser = argparse.ArgumentParser()
parser.add_argument('-u','--username', required=False)
parser.add_argument('-from','--fromdate', required=False)
parser.add_argument('-to','--todate', required=False)
args = vars(parser.parse_args())
username = args['username']
fromdate = args['fromdate']
todate = args['todate']
link = "https://web.archive.org/cdx/search/cdx?url=twitter.com/{}/status&matchType=prefix&from={}&to={}".format(username,fromdate,todate)
listy = []
m = requests.get(link).text
urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.& ]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F])) ', m)
for b, url in enumerate (urls):
listy.append(f"{b}: {url}")
print(listy)
OUTPUT
[]
CodePudding user response:
You are experiencing this behaviour not because of f-strings, but because of how python is formatting your link. There is no f-string being used in the variable link, just string formatting. When passing in null values, python is feeding None into the url instead of leaving them blank like intended. This causes the URL to look something like this
https://web.archive.org/cdx/search/cdx?url=twitter.com/None/status&matchType=prefix&from=None&to=None
One solution is using the or logic operator to tell python to set the variable to an empty string if the value is none. This can be done in the variable declaration. One possible method is below
username = args['username'] or '' # or logic operator converts variable to '' if None.
fromdate = args['fromdate'] or ''
todate = args['todate'] or ''
I hope this helped, and welcome to stack overflow.