how to get "expected_column"? the dataframe is sort_values by time, how to calculate sum of last 'n' rows for each row group by 'ID' column, value of n in 'rolling' columns. One ID always have the same rolling, 2 IDs can have same rolling.
dct_data = {'ID': {0: 'a',
1: 'a',
2: 'a',
3: 'a',
4: 'a',
5: 'b',
6: 'b',
7: 'b',
8: 'b',
9: 'b'},
'time': {0: '2022-12-23 14:56:00',
1: '2022-12-23 14:57:00',
2: '2022-12-23 14:58:00',
3: '2022-12-23 14:59:00',
4: '2022-12-23 15:00:00',
5: '2022-12-23 14:56:00',
6: '2022-12-23 14:57:00',
7: '2022-12-23 14:58:00',
8: '2022-12-23 14:59:00',
9: '2022-12-23 15:00:00'},
'rolling': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 2, 6: 2, 7: 2, 8: 2, 9: 2},
'value': {0: 19, 1: 14, 2: 14, 3: 32, 4: 16, 5: 0, 6: 6, 7: 1, 8: 4, 9: 3} }
df_test = pd.DataFrame(dct_data)
expected_column = [0 0 19, 0 19 14,19 14 14,14 14 32,14 32 16,0 0,0 6,6 1,1 4,4 3]
df_test['expected_column'] = expected_column
[output]
ID time rolling value expected_column
a 2022-12-23 14:56:00 3 19 19
a 2022-12-23 14:57:00 3 14 33
a 2022-12-23 14:58:00 3 14 47
a 2022-12-23 14:59:00 3 32 60
a 2022-12-23 15:00:00 3 16 62
b 2022-12-23 14:56:00 2 0 0
b 2022-12-23 14:57:00 2 6 6
b 2022-12-23 14:58:00 2 1 7
b 2022-12-23 14:59:00 2 4 5
b 2022-12-23 15:00:00 2 3 7
CodePudding user response:
Try the following. Do note, that this works for cases (like your example) where each ID has only one possible rolling window.
You need to first create a function that would apply over each "group" that is created by the ID variable. This would basically take the value columns, apply
.rolling
sum but take the corresponding value inrolling
columnYou then need to apply this to a
groupby
to fetch the column that you need.Finally you need to set it as a new column in your original dataframe.
f = lambda x: x['value'].rolling(x['rolling'].unique()[0], min_periods=1).sum()
df_test['computed_columns'] = df_test.groupby('ID').apply(f).tolist()
print(df_test)
ID time rolling value expected_column computed_columns
0 a 2022-12-23 14:56:00 3 19 19 19.0
1 a 2022-12-23 14:57:00 3 14 33 33.0
2 a 2022-12-23 14:58:00 3 14 47 47.0
3 a 2022-12-23 14:59:00 3 32 60 60.0
4 a 2022-12-23 15:00:00 3 16 62 62.0
5 b 2022-12-23 14:56:00 2 0 0 0.0
6 b 2022-12-23 14:57:00 2 6 6 6.0
7 b 2022-12-23 14:58:00 2 1 7 7.0
8 b 2022-12-23 14:59:00 2 4 5 5.0
9 b 2022-12-23 15:00:00 2 3 7 7.0