Home > Net >  Merge AND sum or concatenate (with TWO dataframes)
Merge AND sum or concatenate (with TWO dataframes)

Time:11-25

(This post does not explain what was requested below. There is the issue of merge before concatenation or sum, using these functions together to fill a column is the question.)

I have two dataframes that look like these (obviously these are examples):

data1 = {'Name': ['Alex','Alex','Cristiano','Cristiano','Fernando','Jonas','William'], 'Color': ['Blue','Red','Black','Blue','Yellow','Pink','Green'], 
    'Codes': ['','','','','','',''], 'Values': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}
data2 = {'Name': ['Jonas','Alex','Cristiano','Cristiano','Alex'], 'Color': ['Pink','Red','Black','Blue','Red'], 
    'Codes': ['1456','1450','1453','1530','1459'], 'Values': [12000.00,5000.50,78000.00,2000.00,1500.00]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

Are so:

df1:
        Name   Color Codes  Values
0       Alex    Blue           NaN
1       Alex     Red           NaN
2  Cristiano   Black           NaN
3  Cristiano    Blue           NaN
4   Fernando  Yellow           NaN
5      Jonas    Pink           NaN
6    William   Green           NaN
df2:
        Name  Color Codes   Values
0      Jonas   Pink  1456  12000.0
1       Alex    Red  1450   5000.5
2  Cristiano  Black  1453  78000.0
3  Cristiano   Blue  1530   2000.0
4       Alex    Red  1459   1500.0

I want to fill the 'Codes' and 'Values' columns of df1 with the corresponding values ​​in df2 ('Code' and 'Values'), doing the matching of the columns 'Name' and 'Color'.

The problem is that there is more than one correspondence, for the column 'Codes' I want the concatenation of strings in 'Code' (separating with a comma), for the column 'Values' I want the sum of the correspondences.

I tried using merge, but this second part I don't know how to implement. To be clearer, the desired output of df1 is this:

        Name   Color        Codes   Values
0       Alex    Blue                   NaN
1       Alex     Red  1450 , 1459   6500.5
2  Cristiano   Black         1453  78000.0
3  Cristiano    Blue         1530   2000.0
4   Fernando  Yellow                   NaN
5      Jonas    Pink         1456  12000.0
6    William   Green                   NaN

CodePudding user response:

You can use pd.merge() using df1 as it is and a grouped version of df2 (grouped by Name and Color). The aggregation should be list on Codes and sum on Values:

pd.merge(df1,
         df2.groupby(['Name','Color']).agg({'Codes':list,'Values':'sum'}).reset_index(),
         how='left',
         on=['Name','Color'],
         suffixes=('_x', ''))\
    .drop(['Codes_x','Values_x'],axis=1)

Which prints back:

        Name   Color         Codes   Values
0       Alex    Blue           NaN      NaN
1       Alex     Red  [1450, 1459]   6500.5
2  Cristiano   Black        [1453]  78000.0
3  Cristiano    Blue        [1530]   2000.0
4   Fernando  Yellow           NaN      NaN
5      Jonas    Pink        [1456]  12000.0
6    William   Green           NaN      NaN
  • Related