let's assume the following data.frame
set.seed(20221117)
df <- data.frame(x = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"),
y = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"))
What would be a reasonably fast way to select the maximum for each row (ideally without having to explicitely convert into double
)?
CodePudding user response:
do.call(pmax, df)
[1] "2020-11-30 22:09:29 GMT" "2026-06-14 20:00:05 GMT"
[3] "2008-02-08 01:32:23 GMT" "2021-06-17 10:44:05 GMT"
[5] "2025-02-18 23:20:28 GMT" "1997-03-27 18:10:44 GMT"
...
Benchmarking
bench::mark(
Sindr = do.call(pmax, df),
Tom = df %>%
rowwise() %>%
mutate(max = max(c(x, y))) %>%
pull(max)
)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl>
1 Sindr 2.29ms 4.14ms 176. 6.49MB 49.9 88 25
2 Tom 6.59s 6.59s 0.152 24.09MB 7.28 1 48
CodePudding user response:
In data.frame
format you can:
df %>%
rowwise() %>%
mutate(max = max(c(x, y)))
# A tibble: 100,000 × 3
# Rowwise:
x y max
<dttm> <dttm> <dttm>
1 2032-12-21 10:50:13 1994-12-09 15:36:53 2032-12-21 10:50:13
2 1988-03-11 14:43:53 1982-05-23 23:39:28 1988-03-11 14:43:53
3 2004-10-17 08:34:41 1986-06-02 03:05:07 2004-10-17 08:34:41
4 2028-03-02 09:27:44 1986-09-11 02:46:20 2028-03-02 09:27:44
5 1985-02-17 09:39:02 2002-08-13 05:42:29 2002-08-13 05:42:29
6 1977-08-04 13:26:26 2021-05-22 14:48:48 2021-05-22 14:48:48
7 1995-05-21 01:13:46 2004-07-19 05:13:18 2004-07-19 05:13:18
8 1988-03-26 21:59:08 1999-06-13 08:34:06 1999-06-13 08:34:06
9 1977-11-15 23:54:57 2026-07-15 12:59:39 2026-07-15 12:59:39
10 2031-07-26 16:51:17 2017-01-12 15:23:15 2031-07-26 16:51:17
# … with 99,990 more rows
# ℹ Use `print(n = ...)` to see more rows