I have two pandas dataframes and I want to compare them to see what the differences are. df1 has a column of all unique IDs, a column of text data, and a column of numeric data. df2 has the same structure but it contains multiple records of the same IDs. I want to take a specific ID from df1 and its corresponding columns then compare it to all the matching IDs in df2 and their corresponding columns. Then i want to take the difference and put them into new df3.
EDIT: df3 should not have rows from df1 if it does not exist in df2
import pandas as pd
data1 = {'ID':['L1', 'L2', 'L3', 'L4'], 'Text':['1A', '1B','1C','1D'], 'Num':[1, 2, 3, 4]}
df1 = pd.DataFrame(data1)
print(df1)
ID | Text | Num |
---|---|---|
L1 | 1A | 1 |
L2 | 1B | 2 |
L3 | 1C | 3 |
L4 | 1D | 4 |
data2 = {'ID':['L1', 'L2', 'L3', 'L1', 'L2', 'L3'], 'Text':['1A','1B','1C','2A','2B','1C'], 'Num':[1, 2, 3, 11,2,123]}
df2 = pd.DataFrame(data2)
print(df2)
ID | Text | Num |
---|---|---|
L1 | 1A | 1 |
L2 | 1B | 2 |
L3 | 1C | 3 |
L4 | 1D | 4 |
L1 | 2A | 11 |
L2 | 2B | 2 |
L3 | 1C | 13 |
I want the out put to looks like:
ID | Text | Num |
---|---|---|
L1 | 2A | 11 |
L2 | 2B | 2 |
L3 | 1C | 123 |
CodePudding user response:
You can concatenate the dataframes and discard all duplicate rows.
pd.concat([df1, df2]).drop_duplicates(keep=False)
CodePudding user response:
You can use an outer merge with indicator:
(df1.merge(df2, how='outer', indicator=True)
.loc[lambda d: d.pop('_merge').eq('right_only')]
)
Output:
ID Text Num
4 L1 2A 11
5 L2 2B 2
6 L3 1C 123
NB. If you need to keep the index, reset_index()
before merge, then set_index('index')
afterwards.