I want to remove intervals in my df which are not on the hour or an hour apart from the previous row.
For example:
timestamp
2021-03-01 12:00
2021-03-01 12:10
2021-03-01 12:20
2021-03-01 12:30
2021-03-01 12:40
2021-03-01 13:00
2021-03-01 14:30
2021-03-01 15:30
2021-03-01 16:30
2021-03-02 12:00
2021-03-02 12:10
2021-03-02 12:20
2021-03-02 12:30
2021-03-02 12:40
2021-03-02 13:00
2021-03-03 11:00
2021-03-03 11:10
2021-03-03 11:20
2021-03-03 11:30
2021-03-03 11:40
2021-03-03 12:00
2021-03-03 13:10
2021-03-03 14:10
2021-03-03 15:10
The df as many dates and the intervals can occur between any hour, and not all intervals are 10 mins.
What I would like to end up with is:
timestamp
2021-03-01 12:00
2021-03-01 13:00
2021-03-01 14:30
2021-03-01 15:30
2021-03-01 16:30
2021-03-02 12:00
2021-03-02 13:00
2021-03-03 11:00
2021-03-03 12:00
2021-03-03 13:10
2021-03-03 14:10
2021-03-03 15:10
TIA
CodePudding user response:
You can use function round_date()
from lubridate
.
library(lubridate)
df <- data.frame(id = 1:4,
timestamp = ymd_hm(
c(
"2021-03-01 12:10",
"2021-03-01 12:00",
"2021-03-01 13:30",
"2021-03-01 14:00"
)
))
precise <- round_date(df$timestamp, unit = "hour")
df |> dplyr::filter(timestamp %in% precise)
(PS: these are date-times, not intervals).
CodePudding user response:
Another option, also using lubridate
library(dplyr)
library(lubridate)
timestamp <- c("2021-03-01 12:00", "2021-03-01 12:10", "2021-03-01 12:20", "2021-03-01 12:30",
"2021-03-01 12:40", "2021-03-01 13:00", "2021-03-01 14:30", "2021-03-01 15:30",
"2021-03-01 16:30", "2021-03-02 12:00", "2021-03-02 12:10", "2021-03-02 12:20",
"2021-03-02 12:30", "2021-03-02 12:40", "2021-03-02 13:00", "2021-03-03 11:00",
"2021-03-03 11:10", "2021-03-03 11:20", "2021-03-03 11:30", "2021-03-03 11:40",
"2021-03-03 12:00", "2021-03-03 13:10", "2021-03-03 14:10", "2021-03-03 15:10")
data <- data.frame(timestamp)
timestamp_exit <- data %>%
mutate(timestamp = format(as.POSIXct(timestamp), format = '%Y-%m-%d %H:%M')) %>%
filter(minute(timestamp) == 0)
Output
> timestamp_exit
timestamp
1 2021-03-01 12:00
2 2021-03-01 13:00
3 2021-03-02 12:00
4 2021-03-02 13:00
5 2021-03-03 11:00
6 2021-03-03 12:00
As for the time intervals: This is a little function that rolls along your timestamp
column and finds each consecutive timestamp being at least one hour apart. Note however, that there is no rounding up the minutes here, but it matches your desired output.
hourinterval <- function(interval, ind = 1) {
ind.next <- first(which(difftime(interval, interval[ind], units="hours") >= 1))
if(is.na(ind.next))
return(ind)
else
return(c(ind, hourinterval(interval, ind.next)))
}
timestamp_exit2 <- data %>%
mutate(timestamp = format(as.POSIXct(timestamp), format = '%Y-%m-%d %H:%M')) %>%
slice(hourinterval(timestamp))
Output
> timestamp_exit2
timestamp
1 2021-03-01 12:00
2 2021-03-01 13:00
3 2021-03-01 14:30
4 2021-03-01 15:30
5 2021-03-01 16:30
6 2021-03-02 12:00
7 2021-03-02 13:00
8 2021-03-03 11:00
9 2021-03-03 12:00
10 2021-03-03 13:10
11 2021-03-03 14:10
12 2021-03-03 15:10