Let's say I have a dataframe of integers between -100 and 100 on which I would like to apply two filters: value greater than 0, absolute value greater than 50. For now I apply these filters that i combined by multiplying them, as so:
df = pd.DataFrame(np.random.randint(-100,100,size=(100, 4)), columns=list('ABCD'))
greater_than_0 = (df>0).astype(int)
abs_greater_than_50 = (abs(df)>50).astype(int)
greater_than_50 = greater_than_0.multiply(abs_greater_than_50)
display(greater_than_50.head())
display(df*greater_than_50)
Another solution would be to use the following method:
greater_than_0 = np.where(df>0,1,0)
abs_greater_than_50 = np.where(abs(df)>50,1,0)
greater_than_50 = pd.DataFrame(data=greater_than_0*abs_greater_than_50, columns=df.columns)
display(greater_than_50.head())
display(df*greater_than_50)
I was wondering wath would be the best solution, knowing that I have hundreds of thousands of rows and hundreds of columns. Any totally different approach would be much appreciated if I have missed something!
(I know that the filter 'value greater than 50' would do the job using only one filter, it is just for the sake of an example)
CodePudding user response:
You can use &
to combine the boolean dataframes
df = pd.DataFrame(np.random.randint(-100,100,size=(100, 4)), columns=list('ABCD'))
cond = (df > 0) & (df.abs() > 50)
filtered_df = df[cond]
CodePudding user response:
you need to use pandas.DataFrame.where. please look at example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(-100,100,size=(100, 4)), columns=list('ABCD'))
greater_than_0 = (df>0).astype(int)
abs_greater_than_50 = (abs(df)>50).astype(int)
greater_than_50 = greater_than_0.multiply(abs_greater_than_50)
test1 = df*greater_than_50
greater_than_0 = (df>0)
abs_greater_than_50 = (abs(df)>50)
test2 = df.where(greater_than_0 & abs_greater_than_50, 0)
pd.testing.assert_frame_equal(test1, test2)