Home > database >  Python pandas : How to find difference between two dataframe based on single column
Python pandas : How to find difference between two dataframe based on single column

Time:11-22

I have two dataframes

df1 = pd.DataFrame({
    'Date':['2013-11-24','2013-11-24','2013-11-25','2013-11-25'],
    'Fruit':['Banana','Orange','Apple','Celery'],
    'Num':[22.1,8.6,7.6,10.2],
    'Color':['Yellow','Orange','Green','Green'],
    })
print(df1)
         Date   Fruit   Num   Color
0  2013-11-24  Banana  22.1  Yellow
1  2013-11-24  Orange   8.6  Orange
2  2013-11-25   Apple   7.6   Green
3  2013-11-25  Celery  10.2   Green

df2 = pd.DataFrame({
    'Date':['2013-11-25','2013-11-25','2013-11-25','2013-11-25','2013-11-25','2013-11-25'],
    'Fruit':['Banana','Orange','Apple','Celery','X','Y'],
    'Num':[22.1,8.6,7.6,10.2,22.1,8.6],
    'Color':['Yellow','Orange','Green','Green','Red','Orange'],
    })
print(df2)
         Date   Fruit   Num   Color
0  2013-11-25  Banana  22.1  Yellow
1  2013-11-25  Orange   8.6  Orange
2  2013-11-25   Apple   7.6   Green
3  2013-11-25  Celery  10.2   Green
4  2013-11-25       X  22.1     Red
5  2013-11-25       Y   8.6  Orange

I am trying to find out the difference between these two dataframes based on the column Fruit

This is what i am doing now but i am not getting the expected output

mapped_df = pd.concat([df1,df2],ignore_index=True).drop_duplicates(keep=False)
print(mapped_df)

Expected output

         Date Fruit   Num   Color
8  2013-11-25     X  22.1     Red
9  2013-11-25     Y   8.6  Orange

CodePudding user response:

You can use the negated isin:

output = df2.loc[~df2['Fruit'].isin(df1['Fruit'])]
  • Related