Home > Software engineering >  Get first and date after last of continuous period pandas
Get first and date after last of continuous period pandas

Time:05-05

Sample of dataframe df I have:

    date_code   item_code   vstore_code
1   2022-03-26  11111       N01
2   2022-03-27  11111       N01
3   2022-03-28  11111       N01
4   2022-03-29  11111       N01
5   2022-03-30  11111       N01
6   2022-03-31  11111       N01
7   2022-04-01  11111       N01
8   2022-04-08  11111       N01
9   2022-04-15  11111       N01
10  2022-04-17  11111       N01
11  2022-04-18  11111       N01
12  2022-04-19  11111       N01
13  2022-04-21  11111       N01
14  2022-04-22  11111       N01
15  2022-04-26  11111       N01
16  2022-02-01  22222       N02
17  2022-02-02  22222       N02
18  2022-02-03  22222       N02
19  2022-02-10  22222       N02

There are a lot of items and stores.

I want to create separate dataframe which will containt records of start and end 1 of each contionuous period for each item at each store.

Expected output:

item_code   store_code   start_period   end_period
11111       N01          2022-03-26     2022-04-02
11111       N01          2022-04-08     2022-04-09
11111       N01          2022-04-15     2022-04-16
11111       N01          2022-04-17     2022-04-20
11111       N01          2022-04-21     2022-04-23
11111       N01          2022-04-26     2022-04-27
22222       N02          2022-02-01     2022-02-04
22222       N02          2022-02-10     2022-02-11

CodePudding user response:

You can aggregate by consecutive datetimes with compare difference by Series.diff for not equal 1 day with Series.cumsum and pass to groupby with aggregate min and max, last add 1 day to end_period column:

df['date_code'] = pd.to_datetime(df['date_code'])
g = df['date_code'].diff().dt.days.ne(1).cumsum()

df = (df.groupby(['item_code','vstore_code',g])
        .agg(start_period=('date_code','min'), 
             end_period=('date_code','max'))
        .droplevel(-1)
        .reset_index()
        .assign(end_period = lambda x: x['end_period']   pd.Timedelta('1 day'))
        )
print (df)

0      11111         N01   2022-03-26 2022-04-02
1      11111         N01   2022-04-08 2022-04-09
2      11111         N01   2022-04-15 2022-04-16
3      11111         N01   2022-04-17 2022-04-20
4      11111         N01   2022-04-21 2022-04-23
5      11111         N01   2022-04-26 2022-04-27
6      22222         N02   2022-02-01 2022-02-04
7      22222         N02   2022-02-10 2022-02-11
  • Related