I'm trying to remove repeated lines and lines containing certain words from scraped data. I searched for various codes but they are not working :(
This is the code. Only the first part works, that removes repeating lines:
openFile = open("links.txt", "r")
writeFile = open("updatedfile.txt", "w")
#Store traversed lines
tmp = set()
for txtLine in openFile:
#Check new line
if txtLine not in tmp:
writeFile.write(txtLine)
#Add new traversed line to tmp
tmp.add(txtLine)
openFile.close()
writeFile.close()
sleep(5)
with open("updatedfile.txt", "r") as fp:
lines = fp.readlines()
with open("updatedfile.txt", "w") as fp:
for line in lines:
if line.strip("\n") != "search":
fp.write(line)
This is the links.txt file
https://twitter.com/search?q=#BTC&src=hashtag_click
https://twitter.com/search?q=#ADA&src=hashtag_click
https://twitter.com/search?q=#LTC&src=hashtag_click
https://twitter.com/search?q=#CAKE&src=hashtag_click
https://twitter.com/Marie62943337
https://twitter.com/Marie62943337
https://twitter.com/Fathur0501
https://twitter.com/Fathur0501
https://twitter.com/BogdanMar93
https://twitter.com/BogdanMar93
https://t.[spaced because body cannot contain short url]co/74ZzkVwa2W
https://t. co/Gv2tyiWfAk
I want the output to be:
https://twitter.com/Marie62943337
https://twitter.com/Fathur0501
https://twitter.com/BogdanMar93
Thanks for your help.
CodePudding user response:
Check this code. I think it works
with open("test.txt", "r") as fp:
lines = fp.readlines()
fp.close()
unique = set()
with open("test.txt", "w") as fp:
for line in lines:
if "search" not in line and line not in unique and "twitter.com" in line:
fp.write(line)
unique.add(line)
Please share the query in the comment below.
CodePudding user response:
Maybe you want to use this, with 'in':
lines = ['https://twitter.com/search?q=#CAKE&src=hashtag_click', 'https://twitter.com/Marie62943337']
for line in lines:
if 'search' not in line:
print(line)