Home > Back-end >  apply resample function on pandas groupby without using for loop
apply resample function on pandas groupby without using for loop

Time:03-03

I have an DataFrame like this:


    item_id     facility_id timestamp            min_inv     current_inv    demand
0   12          1185        2022-01-11 00:00:00  2.0         3.0            1
1   12          1185        2022-01-11 01:00:00  2.0         3.0            1
2   12          1185        2022-01-11 02:00:00  2.0         3.0            2
3   12          1185        2022-01-11 03:00:00  2.0         3.0            2
4   12          1185        2022-01-11 04:00:00  2.0         3.0            1
5   23          1185        2022-01-11 00:00:00  2.0         3.0            1
6   23          1185        2022-01-11 01:00:00  2.0         3.0            2
7   23          1185        2022-01-11 02:00:00  2.0         3.0            1
8   23          1185        2022-01-11 03:00:00  2.0         3.0            2
9   23          1185        2022-01-11 04:00:00  2.0         3.0            1
10  34          1185        2022-01-11 00:00:00  2.0         3.0            1
11  34          1185        2022-01-11 01:00:00  2.0         3.0            1
12  34          1185        2022-01-11 02:00:00  2.0         3.0            4
13  34          1185        2022-01-11 03:00:00  2.0         3.0            2
14  34          1185        2022-01-11 04:00:00  2.0         3.0            2
15  45          1185        2022-01-11 00:00:00  2.0         3.0            3
16  45          1185        2022-01-11 01:00:00  2.0         3.0            2
17  45          1185        2022-01-11 02:00:00  2.0         3.0            3
18  45          1185        2022-01-11 03:00:00  2.0         3.0            1
19  45          1185        2022-01-11 04:00:00  2.0         3.0            3

I want to groupby item_id and resample it for each 2 hour. In the resample, I want to use different aggfunc for different columns.

The way I'm doing this is:

tmp = input_df.groupby('item_id')
new_df = pd.DataFrame()
for group, df in tmp:
    tmp_df = df.set_index('timestamp').resample('2H').agg({'current_inv':'last',
                                                           'min_inv':'last',
                                                           'demand':'sum'})
    tmp_df['item_id'] = group

    new_df = pd.concat([new_df, tmp_df])

new_df gives the output.

Is there more pythonic way for doing this? I don't want to use for loop.

CodePudding user response:

IIUC, you can use on='timestamp' as parameter of resample method to get your expect result in a vectorized way:

out = df.groupby('item_id').resample('2H', on='timestamp') \
        .agg({'current_inv': 'last', 'min_inv':'last', 'demand':'sum'}).reset_index()
print(out)

# Output
    item_id           timestamp  current_inv  min_inv  demand
0        12 2022-01-11 00:00:00          3.0      2.0       2
1        12 2022-01-11 02:00:00          3.0      2.0       4
2        12 2022-01-11 04:00:00          3.0      2.0       1
3        23 2022-01-11 00:00:00          3.0      2.0       3
4        23 2022-01-11 02:00:00          3.0      2.0       3
5        23 2022-01-11 04:00:00          3.0      2.0       1
6        34 2022-01-11 00:00:00          3.0      2.0       2
7        34 2022-01-11 02:00:00          3.0      2.0       6
8        34 2022-01-11 04:00:00          3.0      2.0       2
9        45 2022-01-11 00:00:00          3.0      2.0       5
10       45 2022-01-11 02:00:00          3.0      2.0       4
11       45 2022-01-11 04:00:00          3.0      2.0       3
  • Related