Home > Back-end >  remove intervals between timestamp in R
remove intervals between timestamp in R

Time:03-23

I want to remove intervals in my df which are not on the hour or an hour apart from the previous row.

For example:

timestamp
2021-03-01 12:00
2021-03-01 12:10
2021-03-01 12:20
2021-03-01 12:30
2021-03-01 12:40
2021-03-01 13:00
2021-03-01 14:30
2021-03-01 15:30
2021-03-01 16:30
2021-03-02 12:00
2021-03-02 12:10
2021-03-02 12:20
2021-03-02 12:30
2021-03-02 12:40
2021-03-02 13:00
2021-03-03 11:00
2021-03-03 11:10
2021-03-03 11:20
2021-03-03 11:30
2021-03-03 11:40
2021-03-03 12:00
2021-03-03 13:10
2021-03-03 14:10
2021-03-03 15:10

The df as many dates and the intervals can occur between any hour, and not all intervals are 10 mins.

What I would like to end up with is:

timestamp

2021-03-01 12:00
2021-03-01 13:00
2021-03-01 14:30
2021-03-01 15:30
2021-03-01 16:30
2021-03-02 12:00
2021-03-02 13:00
2021-03-03 11:00
2021-03-03 12:00
2021-03-03 13:10
2021-03-03 14:10
2021-03-03 15:10

TIA

CodePudding user response:

You can use function round_date() from lubridate.

library(lubridate)

df <- data.frame(id = 1:4,
                 timestamp = ymd_hm(
                   c(
                     "2021-03-01 12:10",
                     "2021-03-01 12:00",
                     "2021-03-01 13:30",
                     "2021-03-01 14:00"
                   )
                 ))

precise <- round_date(df$timestamp, unit = "hour")

df |> dplyr::filter(timestamp %in% precise)

(PS: these are date-times, not intervals).

CodePudding user response:

Another option, also using lubridate

library(dplyr)
library(lubridate)

timestamp <- c("2021-03-01 12:00", "2021-03-01 12:10", "2021-03-01 12:20", "2021-03-01 12:30",
               "2021-03-01 12:40", "2021-03-01 13:00", "2021-03-01 14:30", "2021-03-01 15:30",
               "2021-03-01 16:30", "2021-03-02 12:00", "2021-03-02 12:10", "2021-03-02 12:20",
               "2021-03-02 12:30", "2021-03-02 12:40", "2021-03-02 13:00", "2021-03-03 11:00",
               "2021-03-03 11:10", "2021-03-03 11:20", "2021-03-03 11:30", "2021-03-03 11:40",
               "2021-03-03 12:00", "2021-03-03 13:10", "2021-03-03 14:10", "2021-03-03 15:10")

data <- data.frame(timestamp)

timestamp_exit <- data %>% 
  mutate(timestamp = format(as.POSIXct(timestamp), format = '%Y-%m-%d %H:%M')) %>% 
  filter(minute(timestamp) == 0)

Output

> timestamp_exit
         timestamp
1 2021-03-01 12:00
2 2021-03-01 13:00
3 2021-03-02 12:00
4 2021-03-02 13:00
5 2021-03-03 11:00
6 2021-03-03 12:00

As for the time intervals: This is a little function that rolls along your timestamp column and finds each consecutive timestamp being at least one hour apart. Note however, that there is no rounding up the minutes here, but it matches your desired output.

hourinterval <- function(interval, ind = 1) {
  ind.next <- first(which(difftime(interval, interval[ind], units="hours") >= 1))
if(is.na(ind.next))
    return(ind)
  else
    return(c(ind, hourinterval(interval, ind.next)))
}

timestamp_exit2 <- data %>% 
  mutate(timestamp = format(as.POSIXct(timestamp), format = '%Y-%m-%d %H:%M')) %>%
  slice(hourinterval(timestamp))

Output

> timestamp_exit2
          timestamp
1  2021-03-01 12:00
2  2021-03-01 13:00
3  2021-03-01 14:30
4  2021-03-01 15:30
5  2021-03-01 16:30
6  2021-03-02 12:00
7  2021-03-02 13:00
8  2021-03-03 11:00
9  2021-03-03 12:00
10 2021-03-03 13:10
11 2021-03-03 14:10
12 2021-03-03 15:10
  • Related