I have an DataFrame like this:
item_id facility_id timestamp min_inv current_inv demand
0 12 1185 2022-01-11 00:00:00 2.0 3.0 1
1 12 1185 2022-01-11 01:00:00 2.0 3.0 1
2 12 1185 2022-01-11 02:00:00 2.0 3.0 2
3 12 1185 2022-01-11 03:00:00 2.0 3.0 2
4 12 1185 2022-01-11 04:00:00 2.0 3.0 1
5 23 1185 2022-01-11 00:00:00 2.0 3.0 1
6 23 1185 2022-01-11 01:00:00 2.0 3.0 2
7 23 1185 2022-01-11 02:00:00 2.0 3.0 1
8 23 1185 2022-01-11 03:00:00 2.0 3.0 2
9 23 1185 2022-01-11 04:00:00 2.0 3.0 1
10 34 1185 2022-01-11 00:00:00 2.0 3.0 1
11 34 1185 2022-01-11 01:00:00 2.0 3.0 1
12 34 1185 2022-01-11 02:00:00 2.0 3.0 4
13 34 1185 2022-01-11 03:00:00 2.0 3.0 2
14 34 1185 2022-01-11 04:00:00 2.0 3.0 2
15 45 1185 2022-01-11 00:00:00 2.0 3.0 3
16 45 1185 2022-01-11 01:00:00 2.0 3.0 2
17 45 1185 2022-01-11 02:00:00 2.0 3.0 3
18 45 1185 2022-01-11 03:00:00 2.0 3.0 1
19 45 1185 2022-01-11 04:00:00 2.0 3.0 3
I want to groupby item_id
and resample it for each 2 hour. In the resample, I want to use different aggfunc for different columns.
The way I'm doing this is:
tmp = input_df.groupby('item_id')
new_df = pd.DataFrame()
for group, df in tmp:
tmp_df = df.set_index('timestamp').resample('2H').agg({'current_inv':'last',
'min_inv':'last',
'demand':'sum'})
tmp_df['item_id'] = group
new_df = pd.concat([new_df, tmp_df])
new_df
gives the output.
Is there more pythonic way for doing this? I don't want to use for
loop.
CodePudding user response:
IIUC, you can use on='timestamp'
as parameter of resample
method to get your expect result in a vectorized way:
out = df.groupby('item_id').resample('2H', on='timestamp') \
.agg({'current_inv': 'last', 'min_inv':'last', 'demand':'sum'}).reset_index()
print(out)
# Output
item_id timestamp current_inv min_inv demand
0 12 2022-01-11 00:00:00 3.0 2.0 2
1 12 2022-01-11 02:00:00 3.0 2.0 4
2 12 2022-01-11 04:00:00 3.0 2.0 1
3 23 2022-01-11 00:00:00 3.0 2.0 3
4 23 2022-01-11 02:00:00 3.0 2.0 3
5 23 2022-01-11 04:00:00 3.0 2.0 1
6 34 2022-01-11 00:00:00 3.0 2.0 2
7 34 2022-01-11 02:00:00 3.0 2.0 6
8 34 2022-01-11 04:00:00 3.0 2.0 2
9 45 2022-01-11 00:00:00 3.0 2.0 5
10 45 2022-01-11 02:00:00 3.0 2.0 4
11 45 2022-01-11 04:00:00 3.0 2.0 3