Home > other >  f-string interfering with extracting URLs
f-string interfering with extracting URLs

Time:02-04

tl;dr f-string is messing up the script below. List printed is empty despite the file containing a list of URLs. How can I fix this problem and have Python print out the URLs?

So I have a script below. It downloads a list of URLs, converts it into a list, and then prints it out. Now, for the variable link, there's an f-string. If I keep just one value in the f-string (say I delete fromdate and todate and just keep username), it works just fine. But if I put multiple values in the f-string, the script fails.

COMMAND

script.py -u mrbeast

SCRIPT

import argparse, re, requests

parser = argparse.ArgumentParser()
parser.add_argument('-u','--username', required=False)
parser.add_argument('-from','--fromdate', required=False)
parser.add_argument('-to','--todate', required=False)
args = vars(parser.parse_args())
username = args['username']
fromdate = args['fromdate']
todate = args['todate']

link = "https://web.archive.org/cdx/search/cdx?url=twitter.com/{}/status&matchType=prefix&from={}&to={}".format(username,fromdate,todate)
listy = []

m = requests.get(link).text
urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.& ]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F])) ', m)

for b, url in enumerate (urls):
    listy.append(f"{b}: {url}")
    
print(listy)

OUTPUT

[]

CodePudding user response:

You are experiencing this behaviour not because of f-strings, but because of how python is formatting your link. There is no f-string being used in the variable link, just string formatting. When passing in null values, python is feeding None into the url instead of leaving them blank like intended. This causes the URL to look something like this

https://web.archive.org/cdx/search/cdx?url=twitter.com/None/status&matchType=prefix&from=None&to=None

One solution is using the or logic operator to tell python to set the variable to an empty string if the value is none. This can be done in the variable declaration. One possible method is below

username = args['username'] or '' # or logic operator converts variable to '' if None.
fromdate = args['fromdate'] or ''
todate = args['todate'] or ''

I hope this helped, and welcome to stack overflow.

  •  Tags:  
  • Related