My question is similar to this one here, but with a couple of critical differences which I'll try to make clear below.
I have two dataframes:
df1 = pd.DataFrame({'id_A':['0001', '0002', '0003', '0004', '0005'],
'id_B':['0010', '0020', '0030', '0040', '0050'],
'value':['A','B','C','D','E']})
df2 = pd.DataFrame({'id_a':['0020', '0010', '0004', '0003', '0005'],
'id_b':['0002', None, '0040', None, '0050'],
'value':[1,2,3,4,5]})
>>> df1
id_A id_B Value
0 0001 0010 A
1 0002 0020 B
2 0003 0030 C
3 0004 0040 D
4 0005 0050 E
>>> df2
id_a id_b value
0 0020 0002 1
1 0010 None 2
2 0004 0040 3
3 0003 None 4
4 0005 0050 5
Each item (or row) has one or two unique id numbers. These unique id numbers appear in both tables, but one table may be less complete than the other and may only list one of these id numbers for a row when two actually exist. What I want as an output is something like this:
>>> df_final
id_A id_B Value value
0 0001 0010 A 2
1 0002 0020 B 1
2 0003 0030 C 4
3 0004 0040 D 3
4 0005 0050 E 5
The final dataframe should have the same number of rows as df_1. Currently I'm at a loss, so any help would be appreciated.
CodePudding user response:
one option is via update
before merging:
df2.columns = df1.columns
df1 = df1.rename(columns={'value':'Value'})
df2.update(df1)
df1.merge(df2, on = ['id_A', 'id_B'])
id_A id_B Value value
0 0001 0010 A 1
1 0002 0020 B 2
2 0003 0030 C 3
3 0004 0040 D 4
4 0005 0050 E 5
This is restrictive, as it aligns on indices before merging
CodePudding user response:
Try this:
df1['key'] = df1[l].where(df1[l].isin(df2[l].stack().tolist())).fillna(0).apply(frozenset,axis=1)
df2['key'] = df2[l].fillna(0).apply(frozenset,axis=1)
ndf = pd.merge(df1,df2[['key','Value']],on = 'key',how='left').drop('key',axis=1)
Output:
id_A id_B value Value
0 0001 0010 A 2
1 0002 0020 B 1
2 0003 0030 C 4
3 0004 0040 D 3
4 0005 0050 E 5