Sample of dataframe df
I have:
date_code item_code vstore_code
1 2022-03-26 11111 N01
2 2022-03-27 11111 N01
3 2022-03-28 11111 N01
4 2022-03-29 11111 N01
5 2022-03-30 11111 N01
6 2022-03-31 11111 N01
7 2022-04-01 11111 N01
8 2022-04-08 11111 N01
9 2022-04-15 11111 N01
10 2022-04-17 11111 N01
11 2022-04-18 11111 N01
12 2022-04-19 11111 N01
13 2022-04-21 11111 N01
14 2022-04-22 11111 N01
15 2022-04-26 11111 N01
16 2022-02-01 22222 N02
17 2022-02-02 22222 N02
18 2022-02-03 22222 N02
19 2022-02-10 22222 N02
There are a lot of items and stores.
I want to create separate dataframe which will containt records of start and end 1 of each contionuous period for each item
at each store
.
Expected output:
item_code store_code start_period end_period
11111 N01 2022-03-26 2022-04-02
11111 N01 2022-04-08 2022-04-09
11111 N01 2022-04-15 2022-04-16
11111 N01 2022-04-17 2022-04-20
11111 N01 2022-04-21 2022-04-23
11111 N01 2022-04-26 2022-04-27
22222 N02 2022-02-01 2022-02-04
22222 N02 2022-02-10 2022-02-11
CodePudding user response:
You can aggregate by consecutive datetimes with compare difference by Series.diff
for not equal 1 day
with Series.cumsum
and pass to groupby
with aggregate min
and max
, last add 1 day
to end_period
column:
df['date_code'] = pd.to_datetime(df['date_code'])
g = df['date_code'].diff().dt.days.ne(1).cumsum()
df = (df.groupby(['item_code','vstore_code',g])
.agg(start_period=('date_code','min'),
end_period=('date_code','max'))
.droplevel(-1)
.reset_index()
.assign(end_period = lambda x: x['end_period'] pd.Timedelta('1 day'))
)
print (df)
0 11111 N01 2022-03-26 2022-04-02
1 11111 N01 2022-04-08 2022-04-09
2 11111 N01 2022-04-15 2022-04-16
3 11111 N01 2022-04-17 2022-04-20
4 11111 N01 2022-04-21 2022-04-23
5 11111 N01 2022-04-26 2022-04-27
6 22222 N02 2022-02-01 2022-02-04
7 22222 N02 2022-02-10 2022-02-11