I have a list in text file of URLs with some unwanted texts I have wrote a regex which will meet my needs and that is work fine but I am facing a problem where the regex add to the output unwanted samples [''] below the examples:
file content a list of URLs:
http://www.example.com/52 (Status: 403) [Size: 919]
http://www.example.com/details (Status: 403) [Size: 919]
http://www.example.com/h (Status: 403) [Size: 919]
http://www.example.com/affiliate (Status: 403) [Size: 919]
http://www.example.com/56 (Status: 403) [Size: 919]
the regex I used is: "^[://.a-zA-Z0-9-_]*"
the output as below:
['http://www.example.com/52']
['http://www.example.com/details']
['http://www.example.com/h']
['http://www.example.com/affiliate']
['http://www.example.com/56']
I need the output to be like the following:
http://www.example.com/52
http://www.example.com/details
http://www.example.com/h
http://www.example.com/affiliate
http://www.example.com/56
the code used for this program below:
import re
with open("test.txt","r") as test:
for i in test:
x = re.findall("^[://.a-zA-Z0-9-_]*",i)
print(x)
CodePudding user response:
findall
produces a list of strings, you can either print out the first element in the result print(x[0])
or just use match
instead for this use case since there is 1 url per line.
with open("test.txt","r") as test:
for i in test:
x = re.match(r"[://.a-zA-Z0-9-_]*", i)
print(x.group(0))