Home > Mobile >  How to update column value of a data frame from another data frame matching 2 columns?
How to update column value of a data frame from another data frame matching 2 columns?

Time:06-01

I have 2 dataframes, and I want to update the score of rows with the same 2 column values.

How can I do that?

df 1:

DEP ID    |  Team ID |  Group    |   Score
001       |  002     |    A      |    50
001       |  004     |    A      |    70
002       |  002     |    A      |    50
002       |  007     |    A      |    90

df 2 (a subset of one department):

DEP ID    |  Team ID |  Group    |   Result
001       |  002     |    A      |    80
001       |  003     |    A      |    60
001       |  004     |    A      |    70

OUTPUT: All columns with the same TeamID and Group update the score

DEP ID    |  Team ID |  Group    |   Score
001       |  002     |    A      |    80
001       |  004     |    A      |    70
002       |  002     |    A      |    80
002       |  007     |    A      |    90

I've tried doing pd merge left join but I'm not really getting the expected result.

Any suggestions?

CodePudding user response:

Here's a way to do it:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')

Explanation:

  • modify df2 so it has Team ID, Group as its index and its only column is Result
  • use join to bring the new scores from df2 into a Result column in df1
  • use loc to update Score values for rows where Result is not null (i.e., rows for which an updated Score is available)
  • drop the Result column.

Full test code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})

print(df1)
print(df2)

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)

Output:

   index DEP ID Team ID Group  Score
0      0    001     002     A     80
1      1    001     004     A     70
2      2    002     002     A     80
3      3    002     007     A     90

UPDATE:

If Result column in df2 is instead named Score, as asked by OP in a comment, then the code can be adjusted slightly as follows:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')
  • Related