So I have whitelist.txt which contains some links, and scrapedlist.txt which contains other links, and also links that are in whitelist.txt.
I'm trying to open and read whitelist.txt and then open and read scrapedlist.txt - to write to a new file updatedlist2.txt which will have all the contents of scrapedlist.txt minus whitelist.txt.
I'm pretty new to Python, so still learning. I've searched for answers, and this is what I came up with:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
I get this error and I'm also not sure if I'm doing it the right way:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
What should I do to achieve the above-mentioned and also is my code right?
EDIT:
whitelist.txt:
KUWAIT
ISRAEL
FRANCE
scrapedlist.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt (this is how it should be):
USA
CANADA
GERMANY
CodePudding user response:
Based on your description, I applied some changes to your code.
readlines()
method is replaced withread().splitlines()
. Both of them read the whole file and convert each line to a list item. The difference isreadlines()
include\n
at the end of items.unique2
andunique3
are removed. I couldn't find their usage.- By two first parts
whitelist_lines
andscrapedlist_lines
are two lists that contain links. Based on your description we need lines ofscrapedlist_lines
that are not in thewhitelist_lines
list so conditionif unique2 not in line and line not in unique3:
changed toif line not in whitelist_lines:
. - If you are using Python 2.5 and higher the close() can be called for you automatically using the with statement.
The final code is:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().split("\n")
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().split("\n")
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line "\n")