I have 2 pivot tables in pandas pre_pt and sd_pt which may not have the same columns or index values. For example,
pre_pt=
risk_type R1 R2
qualifier
A 1.958512e 06 -10718.288787
B -1.008596e 04 1.933457
C 0.000000e 00 0.329764
D 2.390952e 03 5726.806464
E 1.002147e 04 -4.661991
F -2.016144e 06 12807.479302
sd_pt=
risk_type R1 R3 R4
qualifier
A 1.936494e 06 0.000000 -10425.198385
B -1.010489e 04 0.000000 1.107070
C 0.000000e 00 0.000000 0.568966
D 2.648684e 03 0.000000 5640.661105
E 1.001735e 04 0.000000 -3.839769
F -2.006834e 06 0.000000 12633.668916
G 0.000000e 00 11589.966215 0.000000
I want to be able to get the difference between the two in a new data frame so that I can then concatenate the 3 data frames into a report, I need to fill in the elements that are not in either data-frame to zero so that they can then be subtracted. I do this with the following code. Which works but wanted to know if there was an inbuilt solution in pandas
# create a set of the columns and indices
my_cols = set()
my_qs = set()
my_pts = [pre_pt, sd_pt]
for pts in my_pts:
my_cols.update(pts.columns.tolist())
my_qs.update(pts.index.tolist())
#now add the cols, indices that don't exist
for pts in my_pts:
pts_cols = pts.columns.tolist()
pts_qs = pts.index.tolist()
for c in my_cols:
if c not in pts_cols:
pts[c] = 0.0
for q in my_qs:
if q not in pts_qs:
pts.loc[q] = 0.0
pts = pts.sort_index(axis=1)
pts = pts.sort_index()
diff = sd_pt - pre_pt
#concatenate all of the pivot tables
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis = 1, keys = [pre_start_date, start_date, "Difference"], sort =True)
CodePudding user response:
You can use sub
:
diff = sd_pt.sub(pre_pt, fill_value=0)
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis=1, sort=True,
keys=['pre_start_date', 'start_date', 'Difference'])
print(risk_sd)
# Output
pre_start_date start_date Difference
risk_type R1 R2 R1 R3 R4 R1 R2 R3 R4
qualifier
A 1958512.000 -10718.288787 1936494.000 0.000000 -10425.198385 -22018.000 10718.288787 0.000000 -10425.198385
B -10085.960 1.933457 -10104.890 0.000000 1.107070 -18.930 -1.933457 0.000000 1.107070
C 0.000 0.329764 0.000 0.000000 0.568966 0.000 -0.329764 0.000000 0.568966
D 2390.952 5726.806464 2648.684 0.000000 5640.661105 257.732 -5726.806464 0.000000 5640.661105
E 10021.470 -4.661991 10017.350 0.000000 -3.839769 -4.120 4.661991 0.000000 -3.839769
F -2016144.000 12807.479302 -2006834.000 0.000000 12633.668916 9310.000 -12807.479302 0.000000 12633.668916
G NaN NaN 0.000 11589.966215 0.000000 0.000 NaN 11589.966215 0.000000
Append .fillna(0)
after pd.concat(...)
to fill NaN
by 0
if needed.