Home > Software engineering >  Filter DataFrame by regex and match condition
Filter DataFrame by regex and match condition

Time:09-02

I have the following DataFrame:

Periodicity Answer
M Yes
M YeS
Y yeS
M No
Y NO
M nO

I need to filter the DataFrame to get the rows that have a Monthly (M) periodicity and have a positive (YES, YEs, Yes, YeS, and so on) answer.

I have tried filter it with the following code:

import pandas as pd
import re

data = {'Periodicity': ['M', 'Y', 'M', 'M', 'M', 'Y', 'M', 'M'],
        'Answer': ['YES', 'Yes', 'YEs', 'NO', 'no', 'No', 'yeS', 'yeS']}
  
df = pd.DataFrame(data)

pat=r'^[Yy].*'

df_filter=df[df.Answer.str.contains(pat)]

But I dont know how to add another condition to filter the DataFrame to match the desired Periodicity. Everytime I add another filter condition, I get the following error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

Try this:

df_filter=df[df.Answer.str.contains(pat) & df.Periodicity.str.contains('M')]

CodePudding user response:

You can do this without regex (by using string lower method and dataframe filtering :

import pandas as pd
import re

data = {'Periodicity': ['M', 'Y', 'M', 'M', 'M', 'Y', 'M', 'M'],
        'Answer': ['YES', 'Yes', 'YEs', 'NO', 'no', 'No', 'yeS', 'yeS']}
  
df = pd.DataFrame(data)

df = df[(df['Answer'].str.lower() == 'yes') & (df['Periodicity'] == 'M')] # do this

print(df)

output:

  Periodicity Answer
0           M    YES
2           M    YEs
6           M    yeS
7           M    yeS
  • Related