Compare pandas two different pandas dataframe to extract the difference-CodePudding

I have two pandas dataframes and I want to compare them to see what the differences are. df1 has a column of all unique IDs, a column of text data, and a column of numeric data. df2 has the same structure but it contains multiple records of the same IDs. I want to take a specific ID from df1 and its corresponding columns then compare it to all the matching IDs in df2 and their corresponding columns. Then i want to take the difference and put them into new df3.

EDIT: df3 should not have rows from df1 if it does not exist in df2

import pandas as pd
data1 = {'ID':['L1', 'L2', 'L3', 'L4'], 'Text':['1A', '1B','1C','1D'], 'Num':[1, 2, 3, 4]}
df1 = pd.DataFrame(data1)
print(df1)

ID	Text	Num
L1	1A	1
L2	1B	2
L3	1C	3
L4	1D	4

data2 = {'ID':['L1', 'L2', 'L3', 'L1', 'L2', 'L3'], 'Text':['1A','1B','1C','2A','2B','1C'], 'Num':[1, 2, 3, 11,2,123]}
df2 = pd.DataFrame(data2)
print(df2)

ID	Text	Num
L1	1A	1
L2	1B	2
L3	1C	3
L4	1D	4
L1	2A	11
L2	2B	2
L3	1C	13

I want the out put to looks like:

ID	Text	Num
L1	2A	11
L2	2B	2
L3	1C	123

CodePudding user response：

You can concatenate the dataframes and discard all duplicate rows.

pd.concat([df1, df2]).drop_duplicates(keep=False)

CodePudding user response：

You can use an outer merge with indicator:

(df1.merge(df2, how='outer', indicator=True)
    .loc[lambda d: d.pop('_merge').eq('right_only')]
 )

Output:

   ID Text  Num
4  L1   2A   11
5  L2   2B    2
6  L3   1C  123

NB. If you need to keep the index, reset_index() before merge, then set_index('index') afterwards.