Home > Software design >  adding sub totals to dataframe
adding sub totals to dataframe

Time:07-07

I have a data-frame (df):

          Sales Tax
A1          109   8
A2          116  10
A3          104   7
A total       0   0
B1          110   9
B2          130   8
B total       0   0
C           113  16
D1          135  15
D2          110  16
D3          130  11
D4          151  17
D total       0   0

As you can see the sub total rows (e.g. A Total) do not have a totals for the rows of the data frame that they are associated with. For example Sales for A total should be the sum of Sales for A1,A2 and A3 (109,116 and 104 summing to 329).

Is it possible to return the following:

        Sales   Tax
A1        109     8
A2        116    10
A3        104     7
A total   329    25
B1        110     9
B2        130     8
B total   240    17
C         113    16
D1        135    15
D2        110    16
D3        130    11
D4        151    17
D total   526    59

where any row with 'total' has the sub totals for their respective parts of the dataframe?

CodePudding user response:

You could get the sum of all the groups, even the ones you might not use (like C) and then use the resulting data to update the original df.

import pandas as pd
df = pd.DataFrame({'Sales': {'A1': 109,
  'A2': 116,
  'A3': 104,
  'A total': 0,
  'B1': 110,
  'B2': 130,
  'B total': 0,
  'C': 113,
  'D1': 135,
  'D2': 110,
  'D3': 130,
  'D4': 151,
  'D total': 0},
 'Tax': {'A1': 8,
  'A2': 10,
  'A3': 7,
  'A total': 0,
  'B1': 9,
  'B2': 8,
  'B total': 0,
  'C': 16,
  'D1': 15,
  'D2': 16,
  'D3': 11,
  'D4': 17,
  'D total': 0}})

t = df.groupby(df.index.str[0]).sum()
t.index  = ' total'

df.update(t)

print(df)

Output

    Sales   Tax
A1       109.0   8.0
A2       116.0  10.0
A3       104.0   7.0
A total  329.0  25.0
B1       110.0   9.0
B2       130.0   8.0
B total  240.0  17.0
C        113.0  16.0
D1       135.0  15.0
D2       110.0  16.0
D3       130.0  11.0
D4       151.0  17.0
D total  526.0  59.0

CodePudding user response:

You can form the groups based on the total (each group ends with "total"), then compute the sum and replace with boolean indexing:

# which indices contain "total"?
mask = df.index.str.contains('total')
# form groups using a reverse cumsum
group = mask[::-1].cumsum()[::-1]
# update the total rows with the sum per group
df.loc[mask] = df.groupby(group).transform('sum').loc[mask]

output:

         Sales  Tax
A1         109    8
A2         116   10
A3         104    7
A total    329   25
B1         110    9
B2         130    8
B total    240   17
C          113   16
D1         135   15
D2         110   16
D3         130   11
D4         151   17
D total    639   75
  • Related