I have a dataset, df, where I would like to pivot its rows to column titles and create a running total:
Data
cons_power cons_count id date r_power r_count d_power d_count
500 20 aa q122 50 2 20 1
0 0 aa q222 20 1 0 0
1000 80 bb q122 0 0 50 3
0 0 bb q222 100 5 0 0
Desired
id q122_power q122_count q222_power q222_count
aa 530 21 550 22
bb 950 77 1050 82
Logic
We start w cons_power value of 500 for id 'aa' for q122 and we add r_power value of 50, and subtract d_power value of 20 which gives us: 530 value
We then carry on the value of 530 as our next baseline value for q222 process of adding r_power and subtracting d_power = 550
Same concept is used for the count. (start w a count of 20 for 'aa' q122 and add 2 and subtract 1) giving us: 21 value
We then carry on the value of 21 as our next baseline value for q222 process of adding r_count and subtracting d_count = 22
Doing
df['q122_power'] = df['r_power'].add(df.groupby('id', 'date')
['r_power'].cumsum()).sub(df.groupby('id', 'date')['d_power'].cumsum())
df['q222_power'] = df['r_power'].add(df.groupby('id', 'date')
['r_power'].cumsum()).sub(df.groupby('id', 'date')['d_power'].cumsum())
df['q122_count'] = df['r_count'].add(df.groupby('id', 'date')
['r_count'].cumsum()).sub(df.groupby('id', 'date')['d_count'].cumsum())
df['q222_count'] = df['r_count'].add(df.groupby('id', 'date')
['r_count'].cumsum()).sub(df.groupby('id', 'date')['d_count'].cumsum())
Trying to find a more elegant way of producing the desired output. I have been able to solve the grouping and calculations but unsure on how to incorporate the pivot steps within the script. Any suggestion is appreciated.
CodePudding user response:
Compute power
and count
before groupby
then pivot your dataframe. Finally flat your column multi index and reset row index.
out = df[['id', 'date']].join(
df.assign(power=df['cons_power'] df['r_power'] - df['d_power'],
count=df['cons_count'] df['r_count'] - df['d_count'])
.groupby('id')[['power', 'count']].cumsum()
).pivot('id', 'date', ['power', 'count'])
out.columns = out.columns.to_flat_index().map('_'.join)
Output
>>> out.reset_index()
id power_q122 power_q222 count_q122 count_q222
0 aa 530 550 21 22
1 bb 950 1050 77 82
CodePudding user response:
We can also create a MultiIndex to allow for computations to be done using the top level index groups cons
, r
and d
:
# Columns Not To Split into MultiIndex
df = df.set_index(['id', 'date'])
# Split into MultiIndex
df.columns = df.columns.str.split('_', expand=True)
df = (
# Do Computations on the 2D top-level indexes
(df['cons'] df['r'] - df['d'])
.groupby(level='id').cumsum() # Rolling Total within IDs
.unstack(level='date') # Convert date index level into Column Level
)
# Collapse MultiIndex
df.columns = df.columns.map('_'.join)
# Restore ID column
df = df.reset_index()
df
:
id power_q122 power_q222 count_q122 count_q222
0 aa 530 550 21 22
1 bb 950 1050 77 82
How it works:
set_index
then create MultiIndex with str.split
:
# Columns Not To Split into MultiIndex
df = df.set_index(['id', 'date'])
# Split into MultiIndex
df.columns = df.columns.str.split('_', expand=True)
cons r d
power count power count power count
id date
aa q122 500 20 50 2 20 1
q222 0 0 20 1 0 0
bb q122 1000 80 0 0 50 3
q222 0 0 100 5 0 0
We can now access the top level indexes and do the computation (index alignment will ensure that the computations occur within the lower level indexes power
and count
):
df['cons'] df['r'] - df['d']
power count
id date
aa q122 530 21
q222 20 1
bb q122 950 77
q222 100 5
Then take the groupby cumsum
to get a rolling total within each id
:
(df['cons'] df['r'] - df['d']).groupby(level='id').cumsum()
power count
id date
aa q122 530 21
q222 550 22
bb q122 950 77
q222 1050 82
unstack
the Date column to convert the index level to a column level:
(df['cons'] df['r'] - df['d']).groupby(level='id').cumsum().unstack(level='date')
power count
date q122 q222 q122 q222
id
aa 530 550 21 22
bb 950 1050 77 82
Now the rest is just cleanup collapse the MultiIndex with Index.map
:
df.columns = df.columns.map('_'.join)
power_q122 power_q222 count_q122 count_q222
id
aa 530 550 21 22
bb 950 1050 77 82
And reset_index
to restore a default range index:
df = df.reset_index()
id power_q122 power_q222 count_q122 count_q222
0 aa 530 550 21 22
1 bb 950 1050 77 82
Setup and imports:
import pandas as pd
df = pd.DataFrame({
'cons_power': [500, 0, 1000, 0], 'cons_count': [20, 0, 80, 0],
'id': ['aa', 'aa', 'bb', 'bb'], 'date': ['q122', 'q222', 'q122', 'q222'],
'r_power': [50, 20, 0, 100], 'r_count': [2, 1, 0, 5],
'd_power': [20, 0, 50, 0], 'd_count': [1, 0, 3, 0]
})
CodePudding user response:
Modify df
so that any column starting with d_
has a negative sign:
df_c = pd.DataFrame({key : value * -1
if key.startswith('d_')
else value
for key, value
in df.items()})
df_c = df_c.set_index(['id', 'date'])
Filter for power
and count
and aggregate:
power = df_c.filter(like='power').sum(1).rename('power')
count = df_c.filter(like='count').sum(1).rename('count')
df_c = pd.concat([power, count], axis = 1)
Compute the cumulative sum on the id
and reshape:
df_c = (df_c.groupby('id')
.cumsum()
.unstack('date')
.swaplevel(axis=1)
.rename_axis(columns=[None, None])
)
df_c.columns = df_c.columns.map('_'.join)
df_c.reset_index()
id q122_power q222_power q122_count q222_count
0 aa 530 550 21 22
1 bb 950 1050 77 82