I have a large file of timestamp data, which looks as follows. The dataset covers date and Time in increments of 1 minute. A small subset of the data frame looks like:
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:00","17:16:00", "17:17:00", "17:21:00", "17:22:00", "17:25:00", "17:26:00", "17:28:00")
idleness <- c(1, 1, 1, 1, 1, 1, 1, 1)
df1 <- data.frame(Date, Time, idleness)
Then I convert all the Time values of the data frame into 'times' data type.
df1$Time <- chron(times = df1$Time)
I want to produce a graph to look at minutes of continuous idle (one, two, three min etc) vs counts (how many clumps of consecutive idle minutes).
All time values are in 1-minute intervals. When this flow of 1-minute iterations is disrupted (after 17:17:00, it is not stamped at 17:18:00 hence it is disrupted), it should count the next timestamp as the start of a new consecutive time flow and again calculate the number of rows until this flow is disrupted again. For example in the example data frame, there's only one, 3-minute idle period (17:15:00","17:16:00", "17:17:00"), two 2-minutes idle periods ("17:21:00", "17:22:00" and "17:25:00", "17:26:00") and only one 1-minute idele period ("17:28:00").
Hence the output I expect should look like this:
Continuous Idle Minutes | Count |
---|---|
1 min | 1 |
2 min | 2 |
3 min | 1 |
I hope all is clear - if not, I'd be happy to clarify it further. Thank you very much in advance. I would be grateful if someone can help me out with this.
CodePudding user response:
Try this:
df1$timestamp <- as.POSIXct(paste(df1$Date, df1$Time), format = "%m/%d/%Y %H:%M:%S")
df1$grp <- cumsum(c(TRUE, diff(df1$timestamp) != 1))
df1
# Date Time idleness timestamp grp
# 1 03/06/2019 17:15:00 1 2019-03-06 17:15:00 1
# 2 03/06/2019 17:16:00 1 2019-03-06 17:16:00 1
# 3 03/06/2019 17:17:00 1 2019-03-06 17:17:00 1
# 4 03/06/2019 17:21:00 1 2019-03-06 17:21:00 2
# 5 03/06/2019 17:22:00 1 2019-03-06 17:22:00 2
# 6 03/06/2019 17:25:00 1 2019-03-06 17:25:00 3
# 7 03/06/2019 17:26:00 1 2019-03-06 17:26:00 3
# 8 03/06/2019 17:28:00 1 2019-03-06 17:28:00 4
df2 <- aggregate(timestamp ~ grp, data = df1, FUN = function(z) as.numeric(diff(range(z)), units = "mins"))
df2
# grp timestamp
# 1 1 2
# 2 2 1
# 3 3 1
# 4 4 0
table(df2$timestamp)
# 0 1 2
# 1 2 1
where the 0 1 2
(which are strings) are the number of minutes, and the 1 2 1
are the number of times that gap-minutes occurred.
The last is 0
because the 17:28:00
is by itself, so its range
is 0.