I have a DataFrame df1
with four time series. I want to aggregate these time-series based on the DataFrame groups
which classifies the individual time-series into two groups. Additionally, I have a DataFrame weights
, which defines the weighting factor (change over time) of the time-series within the groups.
What I try to get is a DataFrame df2
which has the time-series aggregated based on the groups and weighted with the weighting factors.
This would be a simple example:
import pandas as pd
df1 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':[1.2, 2.3, 0.3, 0.5],
'02K W':[3.5, 0.1, 'nan', 'nan'],
'03K W':[4.2, 5.2, 2.5, 3.0],
'04K W':[1.5, 2.6, 8.2, 4.2]})
groups = pd.DataFrame({
'ID':['01K W', '02K W', '03K W', '04K W'],
'Group':['Group1', 'Group1', 'Group2', 'Group1']})
weights = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':[0.5, 0.5, 0.25, 0.5],
'02K W':[0.25, 0.25, 'nan', 'nan'],
'03K W':[1, 1, 1, 1],
'04K W':[0.25, 0.25, 0.75, 0.5]})
df2 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Group1':[1.85, 1.82, 6.23, 2.35],
'Group2':[4.2, 5.2, 2.5, 3.0]})
The first element in df2
(=1.85) is calculated as:
1.2x0.5 3.5x0.25 1.5x0.25
Values of group 1: 1.2, 3.5 & 1.5 ('2021-01-01')
Weighting factors for group 1: 0.5, 0.25 & 0.25 ('2021-01-01')
CodePudding user response:
- Multiply
df1
withweights
stack
andmap
the column names to the groupsgroupby
"Date" and group ("level_1") andsum
unstack
and format to the desired output
output = df1.set_index("Date").mul(weights.set_index("Date")).stack().reset_index(1)
output = (output.groupby([output.index,
output["level_1"].map(dict(zip(groups["ID"],groups["Group"])))])
.sum()
.unstack()
.droplevel(0,1)
.rename_axis(None, axis=1)
)
>>> output
Group1 Group2
Date
2021-01-01 1.850 4.2
2021-01-02 1.825 5.2
2021-01-03 6.225 2.5
2021-01-04 2.350 3.0