I have a multi index dataframe that I want to insert a row into.
>>>import numpy as np
>>>import pandas as pd
>>> date = pd.date_range('2023-01-01', periods=3)
>>> size = ['s','m','total']
>>> arrays = [date, size]
>>> index = pd.MultiIndex.from_product(arrays, names=['date','size'])
>>> volume = [4,15,47,8,12,46,4,14,48]
>>> hours = [2,1,13,4,4,10,1,2,10]
>>> df = pd.DataFrame({'volume':volume, 'hours':hours}, index=index)
>>> df
volume hours
date size
2023-01-01 s 4 2
m 15 1
total 47 13
2023-01-02 s 8 4
m 12 4
total 46 10
2023-01-03 s 4 1
m 14 2
total 48 10
How can I insert a row that is labeled as 'l' for each date and is the difference between the 'total' row and the sum of the 's' and 'm' rows? The desired output is as follows:
volume hours
date size
2023-01-01 s 4 2
m 15 1
l 28 10
total 47 13
2023-01-02 s 8 4
m 12 4
l 26 2
total 46 10
2023-01-03 s 4 1
m 14 2
l 30 7
total 48 10
CodePudding user response:
You can try this:
(df.reset_index()
.pivot(index='date', columns='size')
.stack(level=0)
.assign(l = lambda x: x['total'] - (x['m'] x['s']))
.stack()
.reset_index()
.pivot(index=['date', 'size'], columns='level_1')[0]
)
level_1 hours volume
date size
2023-01-01 l 10 28
m 1 15
s 2 4
total 13 47
2023-01-02 l 2 26
m 4 12
s 4 8
total 10 46
2023-01-03 l 7 30
m 2 14
s 1 4
total 10 48