Home > Net >  How to create a unique index based on chunks of time
How to create a unique index based on chunks of time

Time:10-20

I am hoping it is possible to create a unique index or ID for sections of time in R.

I have a large dataset of second-level time data. There are breaks in time which would theoretically allow me to "group" the chunks of times and assign them a unique index or number.

I will try to create a reproducible example, but keep in mind the duration of time in chunks changes, the gaps in time are not evenly spaced, and the date could change from one day to another.

#this is what the dataframe would look like

DateTime
2021-07-12 20:28:26 CDT
2021-07-12 20:28:27 CDT
2021-07-12 20:28:28 CDT
2021-07-12 20:28:29 CDT
2021-07-12 20:28:30 CDT
2021-07-12 23:14:28 CDT
2021-07-12 23:14:29 CDT
2021-07-12 23:14:30 CDT
2021-07-12 23:14:31 CDT
2021-07-12 23:14:32 CDT
2021-07-12 23:14:33 CDT
2021-07-12 23:14:34 CDT
2021-07-12 23:14:35 CDT
2021-07-12 23:14:36 CDT
2021-07-27 17:16:05 CDT
2021-07-27 17:16:06 CDT
2021-07-27 17:16:07 CDT
2021-07-27 17:16:08 CDT
2021-07-27 17:16:09 CDT
2021-07-27 17:16:10 CDT
2021-07-27 17:16:11 CDT
2021-07-27 17:16:12 CDT
2021-07-27 17:16:13 CDT
2021-07-27 17:16:14 CDT
2021-07-27 17:16:15 CDT


#this is for reproducing time times
structure(c(1626139706, 1626139707, 1626139708, 1626139709, 1626139710, 1626149668, 1626149669, 1626149670, 1626149671, 1626149672, 1626149673, 1626149674, 1626149675, 1626149676, 1627424165, 1627424166, 1627424167, 1627424168, 1627424169, 1627424170, 1627424171, 1627424172, 1627424173, 1627424174, 1627424175), 
class = c("POSIXct", "POSIXt"), tzone = "")

Again, I hope to assign a unique number to sections/chunks of time. It would look like the following:

DateTime                 Index
2021-07-12 20:28:26 CDT     1
2021-07-12 20:28:27 CDT     1
2021-07-12 20:28:28 CDT     1
2021-07-12 20:28:29 CDT     1
2021-07-12 20:28:30 CDT     1
2021-07-12 23:14:28 CDT     2
2021-07-12 23:14:29 CDT     2
2021-07-12 23:14:30 CDT     2
2021-07-12 23:14:31 CDT     2
2021-07-12 23:14:32 CDT     2
2021-07-12 23:14:33 CDT     2
2021-07-12 23:14:34 CDT     2
2021-07-12 23:14:35 CDT     2
2021-07-12 23:14:36 CDT     2
2021-07-27 17:16:05 CDT     3
2021-07-27 17:16:06 CDT     3
2021-07-27 17:16:07 CDT     3
2021-07-27 17:16:08 CDT     3
2021-07-27 17:16:09 CDT     3
2021-07-27 17:16:10 CDT     3
2021-07-27 17:16:11 CDT     3
2021-07-27 17:16:12 CDT     3
2021-07-27 17:16:13 CDT     3
2021-07-27 17:16:14 CDT     3
2021-07-27 17:16:15 CDT     3

#edit: something like this is possibility but isn't included in the reproducible example.

DateTime                 Index
2021-07-15 23:59:59 CDT     4
2021-07-16 00:00:00 CDT     4

This is the closest thing I have found to what I am looking for: How do I create a unique ID for each night-time period across consecutive dates?

But I am not sure how to proceed. Any help would be appreciated Thank you.

CodePudding user response:

library(dplyr)
data.frame(DateTime) %>%
  mutate(Index = 1   cumsum(DateTime - lag(DateTime,1,min(DateTime)) > 60))

This would make a new group every time there is a break of 1 minute or more. Datetimes are stored "under the hood" as seconds, so a difference of 60 since the prior ('lag') value is one minute. cumsum is capturing the cumulative sum of times a break that large has occurred.

              DateTime Index
1  2021-07-12 18:28:26     1
2  2021-07-12 18:28:27     1
3  2021-07-12 18:28:28     1
4  2021-07-12 18:28:29     1
5  2021-07-12 18:28:30     1
6  2021-07-12 21:14:28     2
7  2021-07-12 21:14:29     2
8  2021-07-12 21:14:30     2
9  2021-07-12 21:14:31     2
10 2021-07-12 21:14:32     2
11 2021-07-12 21:14:33     2
12 2021-07-12 21:14:34     2
13 2021-07-12 21:14:35     2
14 2021-07-12 21:14:36     2
15 2021-07-27 15:16:05     3
16 2021-07-27 15:16:06     3
17 2021-07-27 15:16:07     3
18 2021-07-27 15:16:08     3
19 2021-07-27 15:16:09     3
20 2021-07-27 15:16:10     3
21 2021-07-27 15:16:11     3
22 2021-07-27 15:16:12     3
23 2021-07-27 15:16:13     3
24 2021-07-27 15:16:14     3
25 2021-07-27 15:16:15     3

CodePudding user response:

If we are looking for the index to increment by 1 for every minute change, then can use floor_date

library(lubridate)
library(tibble)
library(dplyr)
tibble(DateTime) %>% 
   mutate(Index =floor_date(DateTime, unit = 'minute'),
      Index = match(Index, unique(Index)))

-output

# A tibble: 25 × 2
   DateTime            Index
   <dttm>              <int>
 1 2021-07-12 21:28:26     1
 2 2021-07-12 21:28:27     1
 3 2021-07-12 21:28:28     1
 4 2021-07-12 21:28:29     1
 5 2021-07-12 21:28:30     1
 6 2021-07-13 00:14:28     2
 7 2021-07-13 00:14:29     2
 8 2021-07-13 00:14:30     2
 9 2021-07-13 00:14:31     2
10 2021-07-13 00:14:32     2
# … with 15 more rows

CodePudding user response:

Here is an alternative way:

library(dplyr)

tibble(DateTime) %>% 
  mutate(DateTime1 = lag(DateTime, default = DateTime[1])) %>% 
  mutate(helper = DateTime - DateTime1) %>% 
  group_by(Index = cumsum(helper!=1)) %>% 
  select(-DateTime1, -helper)

data:

DateTime <- structure(c(1626139706, 1626139707, 1626139708, 1626139709, 1626139710, 1626149668, 1626149669, 1626149670, 1626149671, 1626149672, 1626149673, 1626149674, 1626149675, 1626149676, 1627424165, 1627424166, 1627424167, 1627424168, 1627424169, 1627424170, 1627424171, 1627424172, 1627424173, 1627424174, 1627424175), 
          class = c("POSIXct", "POSIXt"), tzone = "")

output:

DateTime            Index
   <dttm>              <int>
 1 2021-07-13 03:28:26     1
 2 2021-07-13 03:28:27     1
 3 2021-07-13 03:28:28     1
 4 2021-07-13 03:28:29     1
 5 2021-07-13 03:28:30     1
 6 2021-07-13 06:14:28     2
 7 2021-07-13 06:14:29     2
 8 2021-07-13 06:14:30     2
 9 2021-07-13 06:14:31     2
10 2021-07-13 06:14:32     2
11 2021-07-13 06:14:33     2
12 2021-07-13 06:14:34     2
13 2021-07-13 06:14:35     2
14 2021-07-13 06:14:36     2
15 2021-07-28 00:16:05     3
16 2021-07-28 00:16:06     3
17 2021-07-28 00:16:07     3
18 2021-07-28 00:16:08     3
19 2021-07-28 00:16:09     3
20 2021-07-28 00:16:10     3
21 2021-07-28 00:16:11     3
22 2021-07-28 00:16:12     3
23 2021-07-28 00:16:13     3
24 2021-07-28 00:16:14     3
25 2021-07-28 00:16:15     3
  • Related