Having a couple of dataframes like that (df, df2),
df
D R1 R2 R3
0 D1 1 1 1
1 D1 1 1 1
2 D2 1 2 1
3 D2 1 2 1
4 D3 1 0 1
df2
D R1 R2 R3
0 D1 1 1 1
1 D1 1 1 1
2 D2 1 3 1
3 D2 1 3 1
4 D3 1 1 1
5 D3 2 2 2
6 D3 2 2 2
Is it possible to merge them and create an additional column called "new_values" with those values that only exist in one of the two dataframes ?
Expected result :
D R1 R2 R3 _merge new_values
0 D1 1 1 1 both False
1 D1 1 1 1 both False
2 D1 1 1 1 both False
3 D1 1 1 1 both False
4 D2 1 2 1 left_only False
5 D2 1 2 1 left_only False
6 D3 1 0 1 left_only False
7 D2 1 3 1 right_only False
8 D2 1 3 1 right_only False
9 D3 1 1 1 right_only False
10 D3 2 2 2 right_only True
11 D3 2 2 2 right_only True
CodePudding user response:
Use outer join with merge
and test memebrship of indices in Series.isin
:
df = (df.reset_index()
.merge(df2.reset_index(),
how='outer',
indicator=True,
on=df.columns.tolist()))
df['new_values'] = ~df.pop('index_y').isin(df.pop('index_x'))
print (df)
D R1 R2 R3 _merge new_values
0 D1 1 1 1 both False
1 D1 1 1 1 both False
2 D1 1 1 1 both False
3 D1 1 1 1 both False
4 D2 1 2 1 left_only False
5 D2 1 2 1 left_only False
6 D3 1 0 1 left_only False
7 D2 1 3 1 right_only False
8 D2 1 3 1 right_only False
9 D3 1 1 1 right_only False
10 D3 2 2 2 right_only True
11 D3 2 2 2 right_only True
CodePudding user response:
You could try:
df3 = (df.assign(new_values=~df.index.isin(df2.index))
.merge(df2.assign(new_values=~df2.index.isin(df.index)),
how='outer', indicator=True)
)
output:
D R1 R2 R3 new_values _merge
0 D1 1 1 1 False both
1 D1 1 1 1 False both
2 D1 1 1 1 False both
3 D1 1 1 1 False both
4 D2 1 2 1 False left_only
5 D2 1 2 1 False left_only
6 D3 1 0 1 False left_only
7 D2 1 3 1 False right_only
8 D2 1 3 1 False right_only
9 D3 1 1 1 False right_only
10 D3 2 2 2 True right_only
11 D3 2 2 2 True right_only