Filtering with non-equal time on POSIXct variable in R-CodePudding

I'm wondering if there is a simple way to filter time with a date-time POSIXct variable.

I discovered non-equal filtering with time variable (hms) is straightforward:

> apple_data
# A tibble: 10 × 6
   SYMBOL DATE       TIME        BB    BO date_time          
   <chr>  <date>     <time>   <dbl> <dbl> <dttm>             
 1 AAPL   2009-01-02 09:30:00  85.6  85.6 2009-01-02 09:30:00
 2 AAPL   2009-01-02 09:30:01  85.6  85.9 2009-01-02 09:30:01
 3 AAPL   2009-01-02 09:30:02  85.6  85.7 2009-01-02 09:30:02
 4 AAPL   2009-01-02 09:30:03  85.6  85.7 2009-01-02 09:30:03
 5 AAPL   2009-01-02 09:30:04  85.6  85.8 2009-01-02 09:30:04
 6 AAPL   2009-01-02 09:30:05  85.6  85.7 2009-01-02 09:30:05
 7 AAPL   2009-01-02 09:30:06  85.6  85.7 2009-01-02 09:30:06
 8 AAPL   2009-01-02 09:30:07  85.6  85.7 2009-01-02 09:30:07
 9 AAPL   2009-01-02 09:30:08  85.6  85.7 2009-01-02 09:30:08
10 AAPL   2009-01-02 09:30:09  85.6  85.7 2009-01-02 09:30:09

apple_data %>% filter(TIME <= as_hms("09:30:05"), TIME >= as_hms("09:30:03"))

# A tibble: 3 × 6
  SYMBOL DATE       TIME        BB    BO date_time          
  <chr>  <date>     <time>   <dbl> <dbl> <dttm>             
1 AAPL   2009-01-02 09:30:03  85.6  85.7 2009-01-02 09:30:03
2 AAPL   2009-01-02 09:30:04  85.6  85.8 2009-01-02 09:30:04
3 AAPL   2009-01-02 09:30:05  85.6  85.7 2009-01-02 09:30:05

Question 1

If I do not have DATE and TIME variables but date_time only instead, which is POSIXct, how could I perform non-equi filtering only with time?

Question 2

I tried extracting TIME from date_time using format(date_time, "%T"), and discovered time filtering can be done even though the output is a string. However, it takes too much time to convert string to hms on big data, and I need it for merging with other data.

Is there a fast way to convert string to hms, or extract hms from date_time from the beginning so that I can skip this costly type conversion? Any suggestions are greatly appreciated.

Reprex

structure(list(SYMBOL = structure(c("AAPL", "AAPL", "AAPL", "AAPL", 
"AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL"), label = "Stock Symbol"), 
    DATE = structure(c(14246, 14246, 14246, 14246, 14246, 14246, 
    14246, 14246, 14246, 14246), label = "Quote date", format.sas = "YYMMDDN8", class = "Date"), 
    TIME = structure(c(34200, 34201, 34202, 34203, 34204, 34205, 
    34206, 34207, 34208, 34209), class = c("hms", "difftime"), units = "secs"), 
    BB = structure(c(85.55, 85.6, 85.56, 85.55, 85.57, 85.56, 
    85.61, 85.61, 85.62, 85.62), label = "Best Bid"), BO = structure(c(85.6, 
    85.86, 85.66, 85.66, 85.8, 85.66, 85.66, 85.66, 85.73, 85.73
    ), label = "Best Offer"), date_time = structure(c(1230888600, 
    1230888601, 1230888602, 1230888603, 1230888604, 1230888605, 
    1230888606, 1230888607, 1230888608, 1230888609), tzone = "UTC", format.sas = "DATETIME20", class = c("POSIXct", 
    "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response：

1) Calculate the range of seconds, rng, in the comparison times and also in the time of date_time. That avoids character processing for date_time. Note that 86400 equals 24 * 60 * 60.

library(dplyr, exclude = c("filter", "lag"))

rng <- as.difftime(c("09:30:03", "09:30:05"), unit = "secs")
apple_data %>%
  dplyr::filter(between(as.numeric(date_time) %% 86400, !!!rng))

giving:

# A tibble: 3 × 6
  SYMBOL DATE       TIME          BB    BO date_time          
  <chr>  <date>     <hms>      <dbl> <dbl> <dttm>             
1 AAPL   2009-01-02 34203 secs  85.6  85.7 2009-01-02 09:30:03
2 AAPL   2009-01-02 34204 secs  85.6  85.8 2009-01-02 09:30:04
3 AAPL   2009-01-02 34205 secs  85.6  85.7 2009-01-02 09:30:05

2) A base R version of the above is nearly the same.

Between <- function(x, ..., rng = range(c(...))) x >= rng[1] & x <= rng[2]
rng <- as.difftime(c("09:30:03", "09:30:05"), unit = "secs")
apple_data |>
  subset(Between(as.numeric(date_time) %% 86400, rng))