Home > Mobile >  Create relative value column by groups of two different columns
Create relative value column by groups of two different columns

Time:10-11

I have a DataFrame df1 which contains time-series data with Date, Location, and Value. I try to create df2 with a new column Relative_Value, which is the value of the specific row devided by the sum of the values for a location on a specific date (groupby date and location).

For example:

  • Date=20220101, Location=FE, Value=4: 4/(4 2 6 4 1) = 0.235
df1:
    Date        Location    Value
0   20220101    FE          4
1   20220101    FE          2
2   20220101    FE          6
3   20220101    FE          4
4   20220101    FE          1
5   20220101    RP          4
6   20220101    RP          6
7   20220101    RP          4
8   20220102    FE          1
9   20220102    FE          3
10  20220102    FE          4
11  20220102    FE          1
12  20220102    RP          8
13  20220102    RP          4
14  20220102    RP          5
15  20220102    RP          9

df2:
    Date        Location    Value   Relativ_Value
0   20220101    FE          4       0.235          <-- example
1   20220101    FE          2       0.117
2   20220101    FE          6       0.352
3   20220101    FE          4       0.235
4   20220101    FE          1       0.058
5   20220101    RP          4       0.285
6   20220101    RP          6       0.428
7   20220101    RP          4       0.285
8   20220102    FE          1       0.111
9   20220102    FE          3       0.333
10  20220102    FE          4       0.444
11  20220102    FE          1       0.111
12  20220102    RP          8       0.307
13  20220102    RP          4       0.153
14  20220102    RP          5       0.192
15  20220102    RP          9       0.346

I tried it by first groupby the two columns and the apply function:

df2 = df1.groupby(['Date', 'Location']).apply(lambda x: x/sum(x))

What would be an efficient approach to create df2?

For reproducability:

import pandas as pd
df1 = pd.DataFrame({
    'Date':[20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102],
    'Location':['FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'RP', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'RP', 'RP'],
    'Value':[4, 2, 6, 4, 1, 4, 6, 4, 1, 3, 4, 1, 8, 4, 5, 9]})

Many thanks!

CodePudding user response:

Use GroupBy.transform for improve performance:

df2 = df.assign(Relativ_Value=df['Value']
                        .div(df1.groupby(['Date', 'Location'])['Value'].transform('sum'))
  • Related