how to create sum and inner sum of group by in pandas data frame-CodePudding

I have the below data frame

       sector__description            industry__description     weight
           REAL ESTATE                      REAL ESTATE       0.564507
              INDUSTRIALS              INDUSTRIAL PRODUCTS    0.894072
               TECHNOLOGY   TECH HARDWARE & SEMICONDUCTORS    3.633277
          CLOSED END FUND                   FUND_OBJECTIVE    2.276752
              FINANCIALS               FINANCIAL SERVICES     1.145301
                  ENERGY                        OIL & GAS     3.099118
               TECHNOLOGY   TECH HARDWARE & SEMICONDUCTORS    0.987115
               FINANCIALS                          BANKING    6.034223
              HEALTH CARE                      HEALTH CARE    1.134993
                UTILITIES                        UTILITIES    1.345726
             HEALTH CARE                      HEALTH CARE     1.117600
         CLOSED END FUND                  CLOSED END FUND     0.063387
             FINANCIALS                        INSURANCE      1.275847
        CONSUMER STAPLES         CONSUMER STAPLE PRODUCTS     0.634256
              FINANCIALS                        INSURANCE     15.541616
              FINANCIALS                        INSURANCE     4.777598
             REAL ESTATE                      REAL ESTATE     0.647987
            MONEY MARKET                     MONEY MARKET     2.634138
            MONEY MARKET                     MONEY MARKET     2.637100
             REAL ESTATE                      REAL ESTATE     0.788146
        CONSUMER STAPLES     RETAIL & WHOLESALE - STAPLES     1.754561
              FINANCIALS                          BANKING     2.146774
  CONSUMER DISCRETIONARY  CONSUMER DISCRETIONARY SERVICES     3.579005

from this I need a list like below for all the sectors

[ {'sector': 'FINANCIALS' , weight: 'total sum of all financials', sub:[{'sector': 'FINANCIAL SERVICES' , weight: 'total sum of all financial services'},{'sector': 'INSURANCE' , weight: 'total sum of all insurance'}, {'sector': 'BANKING' , weight: 'total sum of all banking']}, ..... ]

CodePudding user response：

Use list comprehension for your custom format with aggregate sum per both levels:

out = [{'sector': k, 
        'weight': v['weight'].sum(), 
        'sub': v.groupby('industry__description')['weight']
                .sum()
                .rename_axis('sector')
                .reset_index().to_dict(orient='records')}
       for k, v in df.groupby('sector__description')]

print (out)


[
   {
      "sector":"CLOSED END FUND",
      "weight":2.340139,
      "sub":[
         {
            "sector":"CLOSED END FUND",
            "weight":0.063387
         },
         {
            "sector":"FUND_OBJECTIVE",
            "weight":2.276752
         }
      ]
   },
   {
      "sector":"CONSUMER DISCRETIONARY",
      "weight":3.579005,
      "sub":[
         {
            "sector":"CONSUMER DISCRETIONARY SERVICES",
            "weight":3.579005
         }
      ]
   },
   {
      "sector":"CONSUMER STAPLES",
      "weight":2.388817,
      "sub":[
         {
            "sector":"CONSUMER STAPLE PRODUCTS",
            "weight":0.634256
         },
         {
            "sector":"RETAIL & WHOLESALE - STAPLES",
            "weight":1.754561
         }
      ]
   },
   {
      "sector":"ENERGY",
      "weight":3.099118,
      "sub":[
         {
            "sector":"OIL & GAS",
            "weight":3.099118
         }
      ]
   },
   {
      "sector":"FINANCIALS",
      "weight":30.921359,
      "sub":[
         {
            "sector":"BANKING",
            "weight":8.180997
         },
         {
            "sector":"FINANCIAL SERVICES",
            "weight":1.145301
         },
         {
            "sector":"INSURANCE",
            "weight":21.595061
         }
      ]
   },
   {
      "sector":"HEALTH CARE",
      "weight":2.252593,
      "sub":[
         {
            "sector":"HEALTH CARE",
            "weight":2.252593
         }
      ]
   },
   {
      "sector":"INDUSTRIALS",
      "weight":0.894072,
      "sub":[
         {
            "sector":"INDUSTRIAL PRODUCTS",
            "weight":0.894072
         }
      ]
   },
   {
      "sector":"MONEY MARKET",
      "weight":5.271238,
      "sub":[
         {
            "sector":"MONEY MARKET",
            "weight":5.271238
         }
      ]
   },
   {
      "sector":"REAL ESTATE",
      "weight":2.0006399999999998,
      "sub":[
         {
            "sector":"REAL ESTATE",
            "weight":2.0006399999999998
         }
      ]
   },
   {
      "sector":"TECHNOLOGY",
      "weight":4.620392,
      "sub":[
         {
            "sector":"TECH HARDWARE & SEMICONDUCTORS",
            "weight":4.620392
         }
      ]
   },
   {
      "sector":"UTILITIES",
      "weight":1.345726,
      "sub":[
         {
            "sector":"UTILITIES",
            "weight":1.345726
         }
      ]
   }
]

EDIT:

out = [{'n': k, 
        'v': v['weight'].sum(), 
        'sub': v.groupby('industry__description')['weight']
                .sum()
                .sort_values()
                .rename_axis('n')
                .reset_index(name='v')
                .to_dict(orient='records')}
       for k, v in df.groupby('sector__description')]

print (out)