Subtracting Pivot Tables with different dimensions-CodePudding

I have 2 pivot tables in pandas pre_pt and sd_pt which may not have the same columns or index values. For example,

pre_pt=
risk_type     R1             R2
qualifier                            
A        1.958512e 06 -10718.288787
B       -1.008596e 04      1.933457
C        0.000000e 00      0.329764
D        2.390952e 03   5726.806464
E        1.002147e 04     -4.661991
F       -2.016144e 06  12807.479302

sd_pt=
risk_type     R1               R3           R4
qualifier                                          
A        1.936494e 06      0.000000 -10425.198385
B       -1.010489e 04      0.000000      1.107070
C        0.000000e 00      0.000000      0.568966
D        2.648684e 03      0.000000   5640.661105
E        1.001735e 04      0.000000     -3.839769
F       -2.006834e 06      0.000000  12633.668916
G        0.000000e 00  11589.966215      0.000000

I want to be able to get the difference between the two in a new data frame so that I can then concatenate the 3 data frames into a report, I need to fill in the elements that are not in either data-frame to zero so that they can then be subtracted. I do this with the following code. Which works but wanted to know if there was an inbuilt solution in pandas

 # create a set of the columns and indices
                my_cols = set()
                my_qs = set()
                my_pts = [pre_pt, sd_pt]

                for pts in my_pts:
                    my_cols.update(pts.columns.tolist())
                    my_qs.update(pts.index.tolist())

                #now add the cols, indices that don't exist
                for pts in my_pts:
                    pts_cols = pts.columns.tolist()
                    pts_qs = pts.index.tolist()
                    for c in my_cols:
                        if c not in pts_cols:
                            pts[c] = 0.0

                    for q in my_qs:
                        if q not in pts_qs:
                            pts.loc[q] = 0.0

                    pts = pts.sort_index(axis=1)
                    pts = pts.sort_index()

                diff = sd_pt - pre_pt

                #concatenate all of the pivot tables
                risk_sd = pd.concat([pre_pt, sd_pt, diff], axis = 1, keys = [pre_start_date, start_date, "Difference"], sort =True)

CodePudding user response：

You can use sub:

diff = sd_pt.sub(pre_pt, fill_value=0)
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis=1, sort=True,
                    keys=['pre_start_date', 'start_date', 'Difference'])
print(risk_sd)

# Output
          pre_start_date                 start_date                             Difference                                          
risk_type             R1            R2           R1            R3            R4         R1            R2            R3            R4
qualifier                                                                                                                           
A            1958512.000 -10718.288787  1936494.000      0.000000 -10425.198385 -22018.000  10718.288787      0.000000 -10425.198385
B             -10085.960      1.933457   -10104.890      0.000000      1.107070    -18.930     -1.933457      0.000000      1.107070
C                  0.000      0.329764        0.000      0.000000      0.568966      0.000     -0.329764      0.000000      0.568966
D               2390.952   5726.806464     2648.684      0.000000   5640.661105    257.732  -5726.806464      0.000000   5640.661105
E              10021.470     -4.661991    10017.350      0.000000     -3.839769     -4.120      4.661991      0.000000     -3.839769
F           -2016144.000  12807.479302 -2006834.000      0.000000  12633.668916   9310.000 -12807.479302      0.000000  12633.668916
G                    NaN           NaN        0.000  11589.966215      0.000000      0.000           NaN  11589.966215      0.000000

Append .fillna(0) after pd.concat(...) to fill NaN by 0 if needed.