I have a dataframe df1
:-
id | Group | Family | Bonus |
---|---|---|---|
1 | tri23_1 | Laavin | A |
2 | hsgç_T2 | Grendy | B |
3 | bbbj-1Y_jn | Fantol | C |
4 | hsgç_T2 | Gondow | D |
and have a dataframe df2
:-
id | Group | Family | Bonus |
---|---|---|---|
1 | tri | Brendy | B |
2 | hsgç_T2 | Fantol | A |
5 | bbbj-1Y_jn | Fantol | E |
6 | hsgç_T3 | Gondow | D |
For id
in df1
that are present in df2
,I want to set Group
,Family
,Bonus
columns alone of df2
in df1
.
Expected df1
:-
id | Group | Family | Bonus |
---|---|---|---|
1 | tri | Brendy | B |
2 | hsgç_T2 | Fantol | A |
3 | bbbj-1Y_jn | Fantol | C |
4 | hsgç_T2 | Gondow | D |
CodePudding user response:
shared_columns = ['Group', 'Family', 'Bonus']
replacements = (pd.merge(df1, df2, on='id', how='left')
.iloc[:, -len(shared_columns):])
replacements.columns = shared_columns
df1.update(replacements)
Example:
df1 = pd.DataFrame([[1, 1, 0.1, 1],
[2, 2, 0.2, 2],
[3, 3, 0.3, 3]],
columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3, 30, 300, 3],
[1, 10, 100, 1],
[4, 40, 400, 4]],
columns=['id', 'Group', 'Family', 'Bonus'])
df1
:
id Group Family Bonus
0 1 1 0.1 1
1 2 2 0.2 2
2 3 3 0.3 3
df2
:
id Group Family Bonus
0 3 30 300 3
1 1 10 100 1
2 4 40 400 4
pd.merge(df1, df2, on='id', how='left'
performs a left-merge, on the column id
:
id Group_x Family_x Bonus_x Group_y Family_y Bonus_y
0 1 1 0.1 1 10.0 100.0 1.0
1 2 2 0.2 2 NaN NaN NaN
2 3 3 0.3 3 30.0 300.0 3.0
replacements
(after selecting the last columns, and renaming them):
Group Family Bonus
0 10.0 100.0 1.0
1 NaN NaN NaN
2 30.0 300.0 3.0
Resulting df1
:
id Group Family Bonus
0 1 10.0 100.0 1.0
1 2 2.0 0.2 2.0
2 3 30.0 300.0 3.0
CodePudding user response:
if setting index on DF is acceptable, then this is one way to accomplish it. Using update
df.set_index('id', inplace=True)
df.update(df2.set_index('id'))
df
Group Family Bonus
id
1 tri Brendy B
2 hsgç_T2 Fantol A
3 bbbj-1Y_jn Fantol C
4 hsgç_T2 Gondow D