I have a dataframe df
,
ID MID TimeType Starttime Endtime next_Starttime
333429 7 NEB 2021-11-01 20:45:17 2021-11-01 20:45:44 2021-11-01 20:45:44
333430 7 AUF 2021-11-01 20:45:44 2021-11-01 21:00:00 2021-11-01 21:00:00
333476 7 AUF 2021-11-01 21:00:00 2021-11-01 21:03:36 2021-11-01 21:03:36
333477 7 NEB 2021-11-01 21:03:36 2021-11-01 21:11:43 2021-11-01 21:11:43
333502 7 AUF 2021-11-01 21:11:43 2021-11-01 21:11:44 2021-11-01 21:11:44
333511 7 AUF 2021-11-01 21:11:44 2021-11-01 21:25:01 2021-11-01 21:25:01
333553 7 AUF 2021-11-01 21:25:01 2021-11-01 21:40:01 2021-11-01 21:40:01
I would like to assign a unique id for the column TimeType
based on the sequence of repetition such that the desired output looks like this,
ID MID TimeType Starttime Endtime next_Starttime unique_id
333429 7 NEB 2021-11-01 20:45:17 2021-11-01 20:45:44 2021-11-01 20:45:44 1
333430 7 AUF 2021-11-01 20:45:44 2021-11-01 21:00:00 2021-11-01 21:00:00 2
333476 7 AUF 2021-11-01 21:00:00 2021-11-01 21:03:36 2021-11-01 21:03:36 2
333477 7 NEB 2021-11-01 21:03:36 2021-11-01 21:11:43 2021-11-01 21:11:43 3
333502 7 AUF 2021-11-01 21:11:43 2021-11-01 21:11:44 2021-11-01 21:11:44 4
333511 7 AUF 2021-11-01 21:11:44 2021-11-01 21:25:01 2021-11-01 21:25:01 4
333553 7 AUF 2021-11-01 21:25:01 2021-11-01 21:40:01 2021-11-01 21:40:01 4
I tried using for-loops but based on the dataframe size the execution is too slow.
CodePudding user response:
You can use shift
to compare the consecutive rows and cumsum
to make a count:
df['unique_id'] = df['TimeType'].ne(df['TimeType'].shift()).cumsum()
output:
ID MID TimeType Starttime Endtime next_Starttime unique_id
0 333429 7 NEB 2021-11-01 20:45:17 2021-11-01 20:45:44 2021-11-01 20:45:44 1
1 333430 7 AUF 2021-11-01 20:45:44 2021-11-01 21:00:00 2021-11-01 21:00:00 2
2 333476 7 AUF 2021-11-01 21:00:00 2021-11-01 21:03:36 2021-11-01 21:03:36 2
3 333477 7 NEB 2021-11-01 21:03:36 2021-11-01 21:11:43 2021-11-01 21:11:43 3
4 333502 7 AUF 2021-11-01 21:11:43 2021-11-01 21:11:44 2021-11-01 21:11:44 4
5 333511 7 AUF 2021-11-01 21:11:44 2021-11-01 21:25:01 2021-11-01 21:25:01 4
6 333553 7 AUF 2021-11-01 21:25:01 2021-11-01 21:40:01 2021-11-01 21:40:01 4
CodePudding user response:
You can do
df['uni_id'] = df['TimeType'].ne(df['TimeType']).cumsum()
CodePudding user response:
import pdrle
pdrle.get_id(df.TimeType)
# 0 0
# 1 1
# 2 1
# 3 2
# 4 3
# 5 3
# 6 3
# dtype: int64