Home > database >  Compare pandas two different pandas dataframe to extract the difference
Compare pandas two different pandas dataframe to extract the difference

Time:08-11

I have two pandas dataframes and I want to compare them to see what the differences are. df1 has a column of all unique IDs, a column of text data, and a column of numeric data. df2 has the same structure but it contains multiple records of the same IDs. I want to take a specific ID from df1 and its corresponding columns then compare it to all the matching IDs in df2 and their corresponding columns. Then i want to take the difference and put them into new df3.

EDIT: df3 should not have rows from df1 if it does not exist in df2

import pandas as pd
data1 = {'ID':['L1', 'L2', 'L3', 'L4'], 'Text':['1A', '1B','1C','1D'], 'Num':[1, 2, 3, 4]}
df1 = pd.DataFrame(data1)
print(df1)
ID Text Num
L1 1A 1
L2 1B 2
L3 1C 3
L4 1D 4
data2 = {'ID':['L1', 'L2', 'L3', 'L1', 'L2', 'L3'], 'Text':['1A','1B','1C','2A','2B','1C'], 'Num':[1, 2, 3, 11,2,123]}
df2 = pd.DataFrame(data2)
print(df2)
ID Text Num
L1 1A 1
L2 1B 2
L3 1C 3
L4 1D 4
L1 2A 11
L2 2B 2
L3 1C 13

I want the out put to looks like:

ID Text Num
L1 2A 11
L2 2B 2
L3 1C 123

CodePudding user response:

You can concatenate the dataframes and discard all duplicate rows.

pd.concat([df1, df2]).drop_duplicates(keep=False)

CodePudding user response:

You can use an outer merge with indicator:

(df1.merge(df2, how='outer', indicator=True)
    .loc[lambda d: d.pop('_merge').eq('right_only')]
 )

Output:

   ID Text  Num
4  L1   2A   11
5  L2   2B    2
6  L3   1C  123

NB. If you need to keep the index, reset_index() before merge, then set_index('index') afterwards.

  • Related