Home > Net >  Filter dataframe based on matching values from two columns
Filter dataframe based on matching values from two columns

Time:02-10

I have a dataframe like as shown below

cdf = pd.DataFrame({'Id':[1,2,3,4,5],
                    'Label':[1,2,3,0,0]})

I would like to filter the dataframe based on the below criteria

cdf['Id']==cdf['Label']  # first 3 rows are matching for both columns in cdf

I tried the below

flag = np.where[cdf['Id'].eq(cdf['Label'])==True,1,0]
final_df = cdf[cdf['flag']==1]

but I got the below error

TypeError: 'function' object is not subscriptable

I expect my output to be like as shown below

     Id Label
0    1   1
1    2   2
2    3   3

CodePudding user response:

I think you're overthinking this. Just compare the columns:

>>> cdf[cdf['Id'] == cdf['Label']]
   Id  Label
0   1      1
1   2      2
2   3      3

Your particular error though is coming from the fact that you're using square brackets to call np.where, e.g. np.where[...], which is wrong. You should be using np.where(...) instead, but the above solution is bound to be as fast as it gets ;)

CodePudding user response:

Also you can check query

cdf.query('Id == Label')
Out[248]: 
   Id  Label
0   1      1
1   2      2
2   3      3
  • Related