I have a dataframe like as shown below
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
'Label':[1,2,3,0,0]})
I would like to filter the dataframe based on the below criteria
cdf['Id']==cdf['Label'] # first 3 rows are matching for both columns in cdf
I tried the below
flag = np.where[cdf['Id'].eq(cdf['Label'])==True,1,0]
final_df = cdf[cdf['flag']==1]
but I got the below error
TypeError: 'function' object is not subscriptable
I expect my output to be like as shown below
Id Label
0 1 1
1 2 2
2 3 3
CodePudding user response:
I think you're overthinking this. Just compare the columns:
>>> cdf[cdf['Id'] == cdf['Label']]
Id Label
0 1 1
1 2 2
2 3 3
Your particular error though is coming from the fact that you're using square brackets to call np.where
, e.g. np.where[...]
, which is wrong. You should be using np.where(...)
instead, but the above solution is bound to be as fast as it gets ;)
CodePudding user response:
Also you can check query
cdf.query('Id == Label')
Out[248]:
Id Label
0 1 1
1 2 2
2 3 3