I have the following row based dataset in pandas and want to filter modules that fulfill multiple critera, e.g. all modules with (setting='Foo' and value=2) and (setting='bar' and value=1)
Currently i am getting seperatly the modules for each criteria and then building the intersection of all single results. Is there a more efficient way, e.g. to transfer the dataset into a column based dataset and then filtering with one query (setting_foo=2 and settting_bar=3)
module setting value
-----------------------------------------
A Foo 2
A bar 1
B Foo 3
B bar 1
...
Desired output for this example would be module A because module B fulfills bar=1 but not foo=2
CodePudding user response:
You can use .pivot
to make columns for Foo and Bar for each module, as below:
df_pivot = df.pivot(index='module', columns='setting', values='value').reset_index()
Gives:
setting module Foo bar
0 A 2 1
1 B 3 1
Then you can use .loc
to filter where Foo=2 and bar=1, and filter on the module column, as below:
df_pivot.loc[(df_pivot['Foo'] == 2) & (df_pivot['bar'] == 1), 'module']
Which outputs:
0 A
Name: module, dtype: object