Count the rows with consecutive time periods in R-CodePudding

I have a large file of timestamp data, which looks as follows. The dataset covers date and Time in increments of 1 minute. A small subset of the data frame looks like:

Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")

Time <- c("17:15:00","17:16:00", "17:17:00", "17:21:00", "17:22:00", "17:25:00", "17:26:00", "17:28:00")

idleness <- c(1, 1, 1, 1, 1, 1, 1, 1)
df1 <- data.frame(Date, Time, idleness)

Then I convert all the Time values of the data frame into 'times' data type.

df1$Time <- chron(times = df1$Time)

I want to produce a graph to look at minutes of continuous idle (one, two, three min etc) vs counts (how many clumps of consecutive idle minutes).

All time values are in 1-minute intervals. When this flow of 1-minute iterations is disrupted (after 17:17:00, it is not stamped at 17:18:00 hence it is disrupted), it should count the next timestamp as the start of a new consecutive time flow and again calculate the number of rows until this flow is disrupted again. For example in the example data frame, there's only one, 3-minute idle period (17:15:00","17:16:00", "17:17:00"), two 2-minutes idle periods ("17:21:00", "17:22:00" and "17:25:00", "17:26:00") and only one 1-minute idele period ("17:28:00").

Hence the output I expect should look like this:

Continuous Idle Minutes	Count
1 min	1
2 min	2
3 min	1

I hope all is clear - if not, I'd be happy to clarify it further. Thank you very much in advance. I would be grateful if someone can help me out with this.

CodePudding user response：

Try this:

df1$timestamp <- as.POSIXct(paste(df1$Date, df1$Time), format = "%m/%d/%Y %H:%M:%S")
df1$grp <- cumsum(c(TRUE, diff(df1$timestamp) != 1))
df1
#         Date     Time idleness           timestamp grp
# 1 03/06/2019 17:15:00        1 2019-03-06 17:15:00   1
# 2 03/06/2019 17:16:00        1 2019-03-06 17:16:00   1
# 3 03/06/2019 17:17:00        1 2019-03-06 17:17:00   1
# 4 03/06/2019 17:21:00        1 2019-03-06 17:21:00   2
# 5 03/06/2019 17:22:00        1 2019-03-06 17:22:00   2
# 6 03/06/2019 17:25:00        1 2019-03-06 17:25:00   3
# 7 03/06/2019 17:26:00        1 2019-03-06 17:26:00   3
# 8 03/06/2019 17:28:00        1 2019-03-06 17:28:00   4
df2 <- aggregate(timestamp ~ grp, data = df1, FUN = function(z) as.numeric(diff(range(z)), units = "mins"))
df2
#   grp timestamp
# 1   1         2
# 2   2         1
# 3   3         1
# 4   4         0
table(df2$timestamp)
# 0 1 2 
# 1 2 1

where the 0 1 2 (which are strings) are the number of minutes, and the 1 2 1 are the number of times that gap-minutes occurred.

The last is 0 because the 17:28:00 is by itself, so its range is 0.