I have the following DataFrame:
Periodicity | Answer |
---|---|
M | Yes |
M | YeS |
Y | yeS |
M | No |
Y | NO |
M | nO |
I need to filter the DataFrame to get the rows that have a Monthly (M) periodicity and have a positive (YES, YEs, Yes, YeS, and so on) answer.
I have tried filter it with the following code:
import pandas as pd
import re
data = {'Periodicity': ['M', 'Y', 'M', 'M', 'M', 'Y', 'M', 'M'],
'Answer': ['YES', 'Yes', 'YEs', 'NO', 'no', 'No', 'yeS', 'yeS']}
df = pd.DataFrame(data)
pat=r'^[Yy].*'
df_filter=df[df.Answer.str.contains(pat)]
But I dont know how to add another condition to filter the DataFrame to match the desired Periodicity. Everytime I add another filter condition, I get the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all
().
CodePudding user response:
Try this:
df_filter=df[df.Answer.str.contains(pat) & df.Periodicity.str.contains('M')]
CodePudding user response:
You can do this without regex (by using string lower
method and dataframe filtering :
import pandas as pd
import re
data = {'Periodicity': ['M', 'Y', 'M', 'M', 'M', 'Y', 'M', 'M'],
'Answer': ['YES', 'Yes', 'YEs', 'NO', 'no', 'No', 'yeS', 'yeS']}
df = pd.DataFrame(data)
df = df[(df['Answer'].str.lower() == 'yes') & (df['Periodicity'] == 'M')] # do this
print(df)
output:
Periodicity Answer
0 M YES
2 M YEs
6 M yeS
7 M yeS