Home > database >  Setting value of columns in one dataframe to another dataframe column based on condition
Setting value of columns in one dataframe to another dataframe column based on condition

Time:08-24

I have a dataframe df1 :-

id Group Family Bonus
1 tri23_1 Laavin A
2 hsgç_T2 Grendy B
3 bbbj-1Y_jn Fantol C
4 hsgç_T2 Gondow D

and have a dataframe df2 :-

id Group Family Bonus
1 tri Brendy B
2 hsgç_T2 Fantol A
5 bbbj-1Y_jn Fantol E
6 hsgç_T3 Gondow D

For id in df1 that are present in df2,I want to set Group ,Family,Bonus columns alone of df2 in df1. Expected df1:-

id Group Family Bonus
1 tri Brendy B
2 hsgç_T2 Fantol A
3 bbbj-1Y_jn Fantol C
4 hsgç_T2 Gondow D

CodePudding user response:

shared_columns = ['Group', 'Family', 'Bonus']

replacements = (pd.merge(df1, df2, on='id', how='left')
                  .iloc[:, -len(shared_columns):])
replacements.columns = shared_columns

df1.update(replacements)

Example:

df1 = pd.DataFrame([[1,  1, 0.1, 1], 
                    [2,  2, 0.2, 2], 
                    [3,  3, 0.3, 3]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3,  30, 300, 3], 
                    [1,  10, 100, 1], 
                    [4,  40, 400, 4]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])

df1:

   id  Group  Family  Bonus
0   1      1     0.1      1
1   2      2     0.2      2
2   3      3     0.3      3

df2:

   id  Group  Family  Bonus
0   3     30     300      3
1   1     10     100      1
2   4     40     400      4

pd.merge(df1, df2, on='id', how='left' performs a left-merge, on the column id:

   id  Group_x  Family_x  Bonus_x  Group_y  Family_y  Bonus_y
0   1        1       0.1        1     10.0     100.0      1.0
1   2        2       0.2        2      NaN       NaN      NaN
2   3        3       0.3        3     30.0     300.0      3.0

replacements (after selecting the last columns, and renaming them):

   Group  Family  Bonus
0   10.0   100.0    1.0
1    NaN     NaN    NaN
2   30.0   300.0    3.0

Resulting df1:

   id  Group  Family  Bonus
0   1   10.0   100.0    1.0
1   2    2.0     0.2    2.0
2   3   30.0   300.0    3.0

CodePudding user response:

if setting index on DF is acceptable, then this is one way to accomplish it. Using update

df.set_index('id', inplace=True)
df.update(df2.set_index('id'))
df
    Group       Family  Bonus
id          
1   tri         Brendy  B
2   hsgç_T2     Fantol  A
3   bbbj-1Y_jn  Fantol  C
4   hsgç_T2     Gondow  D
  • Related