I am scraping from user channel id's they public information emails based on my keywords but some channels id's repeat and then also emails repeat too while scraping large amount of channel id's, so before I write them line by line to my text I need they check also for possible duplicate email and ignore if email already exist in text file.
Also I would be graceful if you write me there how to remove emptyspaces because I already have code which sometimes works other not works and somehow it writes the empty line with space.
My Code which writes line by line all emails:
with open("scraped_emails.txt", 'a') as f:
for email in cleanEmail:
f.write(email.replace(" ", "") '\n')
CodePudding user response:
You can just add an if
statement to check if the email you want to append is already in the file or not, by doing :
cleanEmail = ['[email protected]', ' [email protected] ', '[email protected]']
with open("scraped_emails.txt", 'r ') as f:
emails = f.read()
for email in cleanEmail:
if email not in emails:
f.write(email.strip() '\n')
Note that I added the strip()
method, and this will solve your empty spaces problem by removing both leading and trailing white spaces.
# Output
[email protected]
[email protected]
[email protected]