Home > OS >  Lazy evaluate Pandas dataframe filters
Lazy evaluate Pandas dataframe filters

Time:11-12

I'm observing a behavior that's weird to me, can anyone tell me how I can define filter once and re-use throughout my code?

>>> df = pd.DataFrame([1,2,3], columns=['A'])
>>> my_filter = df.A == 2
>>> df.loc[1] = 5
>>> df[my_filter]
   A
1  5

I expect my_filter to return empty dataset since none of the A columns are equal to 2.

I'm thinking about making a function that returns the filter and re-use that but is there any more pythonic as well as pandaic way of doing this?

def get_my_filter(df):
    return df.A == 2

df[get_my_filter()]
change df
df[get_my_filter()]

CodePudding user response:

Masks are not dynamic, they stay how you defined them when you defined them. So if you still need to change the dataframe value, you should swap lines 2 and 3. That would work.

CodePudding user response:

you applied the filter in the first place. Changing a value in the row won't help.

df = pd.DataFrame([1,2,3], columns=['A'])
my_filter = df.A == 2
print(my_filter)
'''
    A
0   False
1   True
2   False

'''

as you can see, it returns a series. If you change the data after this process, it will not work. because this represents the first version of the df. But you can use define filter as a string. You can achieve what you want if you use the string filter inside the eval() function.

df = pd.DataFrame([1,2,3], columns=['A'])
my_filter = 'df.A == 2'
df.loc[1] = 5
df[eval(my_filter)]

'''
Out[205]: 
Empty DataFrame
Columns: [A]
Index: []
'''
  • Related