fast way to compare time objects in R-CodePudding

let's assume the following data.frame

set.seed(20221117)
df <- data.frame(x = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"),
                 y = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"))

What would be a reasonably fast way to select the maximum for each row (ideally without having to explicitely convert into double)?

CodePudding user response：

do.call(pmax, df)

[1] "2020-11-30 22:09:29 GMT" "2026-06-14 20:00:05 GMT"
[3] "2008-02-08 01:32:23 GMT" "2021-06-17 10:44:05 GMT"
[5] "2025-02-18 23:20:28 GMT" "1997-03-27 18:10:44 GMT"
...

Benchmarking

bench::mark(
  Sindr = do.call(pmax, df),
  Tom   = df %>%  
    rowwise() %>% 
    mutate(max = max(c(x, y))) %>%
    pull(max)
)

  expression      min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
  <bch:expr> <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>
1 Sindr        2.29ms  4.14ms   176.       6.49MB    49.9     88    25
2 Tom           6.59s   6.59s     0.152   24.09MB     7.28     1    48

CodePudding user response：

In data.frame format you can:

df %>%  
  rowwise() %>% 
  mutate(max = max(c(x, y)))

# A tibble: 100,000 × 3
# Rowwise: 
   x                   y                   max                
   <dttm>              <dttm>              <dttm>             
 1 2032-12-21 10:50:13 1994-12-09 15:36:53 2032-12-21 10:50:13
 2 1988-03-11 14:43:53 1982-05-23 23:39:28 1988-03-11 14:43:53
 3 2004-10-17 08:34:41 1986-06-02 03:05:07 2004-10-17 08:34:41
 4 2028-03-02 09:27:44 1986-09-11 02:46:20 2028-03-02 09:27:44
 5 1985-02-17 09:39:02 2002-08-13 05:42:29 2002-08-13 05:42:29
 6 1977-08-04 13:26:26 2021-05-22 14:48:48 2021-05-22 14:48:48
 7 1995-05-21 01:13:46 2004-07-19 05:13:18 2004-07-19 05:13:18
 8 1988-03-26 21:59:08 1999-06-13 08:34:06 1999-06-13 08:34:06
 9 1977-11-15 23:54:57 2026-07-15 12:59:39 2026-07-15 12:59:39
10 2031-07-26 16:51:17 2017-01-12 15:23:15 2031-07-26 16:51:17
# … with 99,990 more rows
# ℹ Use `print(n = ...)` to see more rows