I have this dataframe that have to contain only e-mails:
email
1 [email protected] #it is not an e-mail so delete it
2 [email protected] #it is a a e-mail so keep it
3 [email protected] #it is not an e-mail so delete it
4 [email protected] #...
How can i delete these rows that aren't e-mail? Maybe based on a condition that if the next value after the point (.) is a number or a .png (or other type image) delete, how to achive this? do you have a better solution ?
Update:
This is the condition i used for scrap them:
mail_list = re.findall('\w @\w \.{1}\w ', html_text)
CodePudding user response:
Only you know the specific selection condition but assuming @ is followed by a non-digit you could use:
df2 = df[df['email'].str.contains(r'@\D', regex = True)]
CodePudding user response:
You could use the regex like:
df2 = df[df['email'].str.contains(r'^[a-z0-9] [\._]?[a-z0-9] [@]\w [.]\w{2,3}$' , regex = True)]