cumsum for each row according to different period on pandas-CodePudding

how to get "expected_column"? the dataframe is sort_values by time, how to calculate sum of last 'n' rows for each row group by 'ID' column, value of n in 'rolling' columns. One ID always have the same rolling, 2 IDs can have same rolling.

dct_data = {'ID': {0: 'a',
                   1: 'a',
                   2: 'a',
                   3: 'a',
                   4: 'a',
                   5: 'b',
                   6: 'b',
                   7: 'b',
                   8: 'b',
                   9: 'b'},
            'time': {0: '2022-12-23 14:56:00',
                     1: '2022-12-23 14:57:00',
                     2: '2022-12-23 14:58:00',
                     3: '2022-12-23 14:59:00',
                     4: '2022-12-23 15:00:00',
                     5: '2022-12-23 14:56:00',
                     6: '2022-12-23 14:57:00',
                     7: '2022-12-23 14:58:00',
                     8: '2022-12-23 14:59:00',
                     9: '2022-12-23 15:00:00'},
            'rolling': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 2, 6: 2, 7: 2, 8: 2, 9: 2},
            'value': {0: 19, 1: 14, 2: 14, 3: 32, 4: 16, 5: 0, 6: 6, 7: 1, 8: 4, 9: 3} }

df_test = pd.DataFrame(dct_data)
expected_column = [0 0 19, 0 19 14,19 14 14,14 14 32,14 32 16,0 0,0 6,6 1,1 4,4 3]
df_test['expected_column'] = expected_column

[output]
ID  time                    rolling   value  expected_column
a   2022-12-23 14:56:00     3         19     19
a   2022-12-23 14:57:00     3         14     33
a   2022-12-23 14:58:00     3         14     47
a   2022-12-23 14:59:00     3         32     60
a   2022-12-23 15:00:00     3         16     62
b   2022-12-23 14:56:00     2         0      0
b   2022-12-23 14:57:00     2         6      6
b   2022-12-23 14:58:00     2         1      7
b   2022-12-23 14:59:00     2         4      5
b   2022-12-23 15:00:00     2         3      7

CodePudding user response：

Try the following. Do note, that this works for cases (like your example) where each ID has only one possible rolling window.

You need to first create a function that would apply over each "group" that is created by the ID variable. This would basically take the value columns, apply .rolling sum but take the corresponding value in rolling column
You then need to apply this to a groupby to fetch the column that you need.
Finally you need to set it as a new column in your original dataframe.

f = lambda x: x['value'].rolling(x['rolling'].unique()[0], min_periods=1).sum()
df_test['computed_columns'] = df_test.groupby('ID').apply(f).tolist()

print(df_test)

  ID                 time  rolling  value  expected_column  computed_columns
0  a  2022-12-23 14:56:00        3     19               19              19.0
1  a  2022-12-23 14:57:00        3     14               33              33.0
2  a  2022-12-23 14:58:00        3     14               47              47.0
3  a  2022-12-23 14:59:00        3     32               60              60.0
4  a  2022-12-23 15:00:00        3     16               62              62.0
5  b  2022-12-23 14:56:00        2      0                0               0.0
6  b  2022-12-23 14:57:00        2      6                6               6.0
7  b  2022-12-23 14:58:00        2      1                7               7.0
8  b  2022-12-23 14:59:00        2      4                5               5.0
9  b  2022-12-23 15:00:00        2      3                7               7.0