Home > Net >  How to combine two dataframes into one and aggregate common records in python?
How to combine two dataframes into one and aggregate common records in python?

Time:08-13

df1:

Date                Code        Name    Rating        x             y          z
2022-07-27 00:00:00 OMER       OMERS    I-2       2027983745    2029539000  1555255.31
2022-07-27 00:00:00 SC        SOCIETY   I-7       389659466.4   391147968.2 1488501.805
2022-07-27 00:00:00 CD      CORPORATION I-3       2692692761    2694172512  1479750.8
2022-07-27 00:00:00 PRIN        AGENT   I-3       72990460.96   74455570    1465109.042
2022-07-27 00:00:00 BF          FUND    S-3       277607047.4   279044540.2 1437492.761

df2:

Date                     Code             Name      Rating            x       y     z
2022-07-27 00:00:00      BankA             nan       nan            1052    1052    0
2022-07-27 00:00:00       CD           CORPORATION   I-3            1943    2000    57
2022-07-27 00:00:00      CorporationA      nan       nan            1943    3052    1109

Expected Output:

Date                Code            Name    Rating        x             y          z
2022-07-27 00:00:00 OMER            OMERS   I-2       2027983745    2029539000  1555255.31
2022-07-27 00:00:00 SC             SOCIETY  I-7       389659466.4   391147968.2 1488501.805
2022-07-27 00:00:00 CD          CORPORATION I-3       2692694704    2694174512  1479807.8
2022-07-27 00:00:00 PRIN           AGENT    I-3       72990460.96   74455570    1465109.042
2022-07-27 00:00:00 BF              FUND    S-3       277607047.4   279044540.2 1437492.761
2022-07-27 00:00:00 BankA           nan     nan         1052           1052         0
2022-07-27 00:00:00 CorporationA    nan     nan         1943           3052        1109

I want to combined df1 and df2 into one dataframe. In DF2 if the "code", "Name","Rating" are already present in df1, need to sum the values of x,y and z into one row and if not available just append the value in the bottom of the dataframe. Appreciate your help in this! Thank you

enter image description here

CodePudding user response:

You can try pd.concat then groupby.agg

cols1 = ['x', 'y', 'z']
cols2 = ['Code', 'Name', 'Rating']
d = {col: 'sum' if col in cols1 else 'first' for col in df1.columns}

df2[cols2] = df2[cols2].fillna('NaN')
out = (pd.concat([df1, df2], ignore_index=True)
       .groupby(cols2)
       .agg(d).reset_index(drop=True))
print(out)

                  Date          Code         Name Rating              x              y           z
0  2022-07-27 00:00:00            BF         FUND    S-3  277607047.400  279044540.200 1437492.761
1  2022-07-27 00:00:00         BankA          NaN    NaN       1052.000       1052.000       0.000
2  2022-07-27 00:00:00            CD  CORPORATION    I-3 2692694704.000 2694174512.000 1479807.800
3  2022-07-27 00:00:00  CorporationA          NaN    NaN       1943.000       3052.000    1109.000
4  2022-07-27 00:00:00          OMER        OMERS    I-2 2027983745.000 2029539000.000 1555255.310
5  2022-07-27 00:00:00          PRIN        AGENT    I-3   72990460.960   74455570.000 1465109.042
6  2022-07-27 00:00:00            SC      SOCIETY    I-7  389659466.400  391147968.200 1488501.805
  • Related