Home > Mobile >  Grouping columns of dataframe by other dataframe and calculate weighted average of aggregated column
Grouping columns of dataframe by other dataframe and calculate weighted average of aggregated column

Time:11-18

I have a DataFrame df1 with four time series. I want to aggregate these time-series based on the DataFrame groups which classifies the individual time-series into two groups. Additionally, I have a DataFrame weights, which defines the weighting factor (change over time) of the time-series within the groups.

What I try to get is a DataFrame df2 which has the time-series aggregated based on the groups and weighted with the weighting factors.

This would be a simple example:

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 2.3, 0.3, 0.5], 
    '02K W':[3.5, 0.1, 'nan', 'nan'], 
    '03K W':[4.2, 5.2, 2.5, 3.0], 
    '04K W':[1.5, 2.6, 8.2, 4.2]}) 

groups = pd.DataFrame({
    'ID':['01K W', '02K W', '03K W', '04K W'],
    'Group':['Group1', 'Group1', 'Group2', 'Group1']}) 

weights = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[0.5, 0.5, 0.25, 0.5], 
    '02K W':[0.25, 0.25, 'nan', 'nan'], 
    '03K W':[1, 1, 1, 1], 
    '04K W':[0.25, 0.25, 0.75, 0.5]}) 

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Group1':[1.85, 1.82, 6.23, 2.35], 
    'Group2':[4.2, 5.2, 2.5, 3.0]})

The first element in df2 (=1.85) is calculated as: 1.2x0.5 3.5x0.25 1.5x0.25

Values of group 1: 1.2, 3.5 & 1.5 ('2021-01-01')

Weighting factors for group 1: 0.5, 0.25 & 0.25 ('2021-01-01')

CodePudding user response:

  1. Multiply df1 with weights
  2. stack and map the column names to the groups
  3. groupby "Date" and group ("level_1") and sum
  4. unstack and format to the desired output
output = df1.set_index("Date").mul(weights.set_index("Date")).stack().reset_index(1)
output = (output.groupby([output.index, 
                          output["level_1"].map(dict(zip(groups["ID"],groups["Group"])))])
          .sum()
          .unstack()
          .droplevel(0,1)
          .rename_axis(None, axis=1)
          )

>>> output
            Group1  Group2
Date                      
2021-01-01   1.850     4.2
2021-01-02   1.825     5.2
2021-01-03   6.225     2.5
2021-01-04   2.350     3.0
  • Related