I have the below data frame
sector__description industry__description weight
REAL ESTATE REAL ESTATE 0.564507
INDUSTRIALS INDUSTRIAL PRODUCTS 0.894072
TECHNOLOGY TECH HARDWARE & SEMICONDUCTORS 3.633277
CLOSED END FUND FUND_OBJECTIVE 2.276752
FINANCIALS FINANCIAL SERVICES 1.145301
ENERGY OIL & GAS 3.099118
TECHNOLOGY TECH HARDWARE & SEMICONDUCTORS 0.987115
FINANCIALS BANKING 6.034223
HEALTH CARE HEALTH CARE 1.134993
UTILITIES UTILITIES 1.345726
HEALTH CARE HEALTH CARE 1.117600
CLOSED END FUND CLOSED END FUND 0.063387
FINANCIALS INSURANCE 1.275847
CONSUMER STAPLES CONSUMER STAPLE PRODUCTS 0.634256
FINANCIALS INSURANCE 15.541616
FINANCIALS INSURANCE 4.777598
REAL ESTATE REAL ESTATE 0.647987
MONEY MARKET MONEY MARKET 2.634138
MONEY MARKET MONEY MARKET 2.637100
REAL ESTATE REAL ESTATE 0.788146
CONSUMER STAPLES RETAIL & WHOLESALE - STAPLES 1.754561
FINANCIALS BANKING 2.146774
CONSUMER DISCRETIONARY CONSUMER DISCRETIONARY SERVICES 3.579005
from this I need a list like below for all the sectors
[ {'sector': 'FINANCIALS' , weight: 'total sum of all financials', sub:[{'sector': 'FINANCIAL SERVICES' , weight: 'total sum of all financial services'},{'sector': 'INSURANCE' , weight: 'total sum of all insurance'}, {'sector': 'BANKING' , weight: 'total sum of all banking']}, ..... ]
CodePudding user response:
Use list comprehension for your custom format with aggregate sum
per both levels:
out = [{'sector': k,
'weight': v['weight'].sum(),
'sub': v.groupby('industry__description')['weight']
.sum()
.rename_axis('sector')
.reset_index().to_dict(orient='records')}
for k, v in df.groupby('sector__description')]
print (out)
[
{
"sector":"CLOSED END FUND",
"weight":2.340139,
"sub":[
{
"sector":"CLOSED END FUND",
"weight":0.063387
},
{
"sector":"FUND_OBJECTIVE",
"weight":2.276752
}
]
},
{
"sector":"CONSUMER DISCRETIONARY",
"weight":3.579005,
"sub":[
{
"sector":"CONSUMER DISCRETIONARY SERVICES",
"weight":3.579005
}
]
},
{
"sector":"CONSUMER STAPLES",
"weight":2.388817,
"sub":[
{
"sector":"CONSUMER STAPLE PRODUCTS",
"weight":0.634256
},
{
"sector":"RETAIL & WHOLESALE - STAPLES",
"weight":1.754561
}
]
},
{
"sector":"ENERGY",
"weight":3.099118,
"sub":[
{
"sector":"OIL & GAS",
"weight":3.099118
}
]
},
{
"sector":"FINANCIALS",
"weight":30.921359,
"sub":[
{
"sector":"BANKING",
"weight":8.180997
},
{
"sector":"FINANCIAL SERVICES",
"weight":1.145301
},
{
"sector":"INSURANCE",
"weight":21.595061
}
]
},
{
"sector":"HEALTH CARE",
"weight":2.252593,
"sub":[
{
"sector":"HEALTH CARE",
"weight":2.252593
}
]
},
{
"sector":"INDUSTRIALS",
"weight":0.894072,
"sub":[
{
"sector":"INDUSTRIAL PRODUCTS",
"weight":0.894072
}
]
},
{
"sector":"MONEY MARKET",
"weight":5.271238,
"sub":[
{
"sector":"MONEY MARKET",
"weight":5.271238
}
]
},
{
"sector":"REAL ESTATE",
"weight":2.0006399999999998,
"sub":[
{
"sector":"REAL ESTATE",
"weight":2.0006399999999998
}
]
},
{
"sector":"TECHNOLOGY",
"weight":4.620392,
"sub":[
{
"sector":"TECH HARDWARE & SEMICONDUCTORS",
"weight":4.620392
}
]
},
{
"sector":"UTILITIES",
"weight":1.345726,
"sub":[
{
"sector":"UTILITIES",
"weight":1.345726
}
]
}
]
EDIT:
out = [{'n': k,
'v': v['weight'].sum(),
'sub': v.groupby('industry__description')['weight']
.sum()
.sort_values()
.rename_axis('n')
.reset_index(name='v')
.to_dict(orient='records')}
for k, v in df.groupby('sector__description')]
print (out)