I am trying to tag TRUE or FALSE to an email message dataframe that has columns SenderEmail, Counterparties, and MessageBody
df['Spam'] = df['SenderEmail'].apply(lambda x: True if "no" and "reply" in x.lower() else "")
df['Spam'] = df['MessageBody'].apply(lambda x: True if "please do not reply" in x.lower() else "")
The code works, but I realise that after I ran one after the other, the results from the second line code will overrun the results from the first line code, leaving me with the results from the second line code only. I can’t remove the else “” while using this, so I was thinking to run a for loop instead. But I’m not sure how to do so.
CodePudding user response:
You can use
df['Spam'] = (df['SenderEmail'].str.contains('^(?=.*no)(?=.*reply)', case=False) |
df['MessageBody'].str.contains('please do not reply', case=False))
Here,
df['SenderEmail'].str.contains('^(?=.*no)(?=.*reply)', case=False)
checks if theSenderEmail
column value contains both substringsno
andreply
df['MessageBody'].str.contains('please do not reply', case=False)
checks ifMessageBody
column containsplease do not reply
substring.
The case=False
enables case insensitive checking.
Pandas test:
import pandas as pd
df = pd.DataFrame(
{'SenderEmail': ['no reply', 'reply', 'no', 'and more no some reply'],
'MessageBody':['ok', 'please do not reply', 'ok', 'ok']})
df['Spam'] = (df['SenderEmail'].str.contains('^(?=.*no)(?=.*reply)', case=False) |
df['MessageBody'].str.contains('please do not reply', case=False))
# => df
# SenderEmail MessageBody Spam
# 0 no reply ok True
# 1 reply please do not reply True
# 2 no ok False
# 3 and more no some reply ok True