Say I have a dataframe:
Do | Re | Mi | Fa | So |
---|---|---|---|---|
1 | 0 | Foo | 100 | 50 |
1 | 1 | Bar | 75 | 20 |
0 | 0 | True | 59 | 59 |
1 | 1 | False | 0 | 12 |
How would I go about finding all rows where the value in columns "Do" and "Re" BOTH equal 1 AND "Fa" is higher than "So"?
I've tried a couple of ways, but they first returns an error complaining of ambiguity:
df['both_1']=((df['Do'] > df['Re']) & (df['Mi'] == df['Fa'] == 1))
I also tried breaking it down into steps, but I realised the last step will result in me bringing in both True and False statements. I only want True.
df['Do_1'] = df['Do'] == 1
df['Re_1'] = df['Re'] == 1
# This is where I realised I'm bringing in the False rows too
df['both_1'] = (df['Do1'] == df['Re_1'])```
CodePudding user response:
Chain mask by another &
for bitwise AND
:
df['both_1']= (df['Fa'] > df['So']) & (df['Do'] == 1) & (df['Re'] == 1)
print (df)
Do Re Mi Fa So both_1
0 1 0 Foo 100 50 False
1 1 1 Bar 75 20 True
2 0 0 True 59 59 False
3 1 1 False 0 12 False
Or if possible multiple columns in list comapre filtered columns by subset df[['Do', 'Re']]
and test if all Trues by DataFrame.all
:
df['both_1']= (df['Fa'] > df['So']) & (df[['Do', 'Re']] == 1).all(axis=1)
If need filter use boolean indexing
:
df1 = df[(df['Fa'] > df['So']) & (df['Do'] == 1) & (df['Re'] == 1)]
For second solution:
df1 = df[(df['Fa'] > df['So']) & (df[['Do', 'Re']] == 1).all(axis=1)]
CodePudding user response:
You can combine the three conditions using &
(bitwise and):
df[(df['Do'] == 1) & (df['Re'] == 1) & (df['Fa'] > df['So'])]
Output (for your sample data):
Do Re Mi Fa So
1 1 1 Bar 75 20
CodePudding user response:
Try:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Do':[1,1,0,1],'Re':[0,1,0,1],'Mi':['Foo','Bar',True,False],'Fa':[100,75,59,0],'Sol':[50,20,59,12]})
df.loc[np.where((df.Do == 1) & (df.Re ==1) & (df.Fa > df.Sol))[0]]
Output:
Do Re Mi Fa Sol
1 1 1 Bar 75 20
If you do not want to import anything else, just do this:
df.assign(keep=[1 if (df.loc[i,'Do'] == 1 and df.loc[i,'Re'] == 1 and df.loc[i,'Fa'] > df.loc[i,'Sol']) else 0 for i in df.index]).query("keep == 1").drop('keep',axis=1)