Home > front end >  Pandas: create a period based on date column
Pandas: create a period based on date column

Time:09-17

I have a dataframe

ID              datetime 
11              01-09-2021 10:00:00
11              01-09-2021 10:15:15
11              01-09-2021 15:00:00
12              01-09-2021 15:10:00
11              01-09-2021 18:00:00

I need to add period based just on datetime if it increases to 2 hours

ID              datetime                 period
11              01-09-2021 10:00:00        1
11              01-09-2021 10:15:15        1
11              01-09-2021 15:00:00        2
12              01-09-2021 15:10:00        2
11              01-09-2021 18:00:00        3

And the same thing but based on ID and datetime

ID              datetime                 period
11              01-09-2021 10:00:00        1
11              01-09-2021 10:15:15        1
11              01-09-2021 15:00:00        2
12              01-09-2021 15:10:00        1
11              01-09-2021 18:00:00        3

How can I do that?

CodePudding user response:

You can get difference by Series.diff, convert to hours Series.dt.total_seconds, comapre for 2 and add cumulative sum:

df['period'] = df['datetime'].diff().dt.total_seconds().div(3600).gt(2).cumsum().add(1)
print (df)
   ID            datetime  period
0  11 2021-01-09 10:00:00       1
1  11 2021-01-09 10:15:15       1
2  11 2021-01-09 15:00:00       2
3  12 2021-01-09 15:10:00       2
4  11 2021-01-09 18:00:00       3

Similar idea per groups:

f = lambda x: x.diff().dt.total_seconds().div(3600).gt(2).cumsum().add(1)
df['period'] = df.groupby('ID')['datetime'].transform(f)
print (df)
   ID            datetime  period
0  11 2021-01-09 10:00:00       1
1  11 2021-01-09 10:15:15       1
2  11 2021-01-09 15:00:00       2
3  12 2021-01-09 15:10:00       1
4  11 2021-01-09 18:00:00       3
  • Related