For a given positive integer n
, I want to know the fastest base R (not Rcpp
) algorithm for constructing the integer vector c(1L, 1:2, ..., 1:n)
, which has length n*(n 1)/2
. There are bonus points for fast and memory-efficient algorithms, since I ultimately want to avoid allocating a vector of length n*n
.
I'm aware of at least two approaches:
unlist(lapply(seq_len(n), seq_len), FALSE, FALSE)
{J <- .row(c(n, n)); J[upper.tri(J, TRUE)]}
the latter being particularly inefficient since it allocates two integer vectors of length n*n
.
Note that if we assign the value .col(c(n, n))
to J
above, then we obtain the sequence 1 2 2 3 3 3 4 4 4 4 ...
. This sequence can be constructed very fast and efficiently with {i <- seq_len(n); rep.int(i, i)}
.
I am wondering if a similarly fast (or faster) algorithm exists in the .row(c(n, n))
case, or if unlist
-lapply
is optimal from a base R standpoint.
FWIW, here is a benchmark of the three procedures I've mentioned so far:
## Seemingly optimal for 1 2 2 3 3 3 4 4 4 4 ...
f0 <- function(n) {i <- seq_len(n); rep.int(i, i)}
## Candidates for 1 1 2 1 2 3 1 2 3 4 ... (the sequence I actually want)
f1 <- function(n) unlist(lapply(seq_len(n), seq_len), FALSE, FALSE)
f2 <- function(n) {J <- .row(c(n, n)); J[upper.tri(J, TRUE)]}
n <- 1000L
microbenchmark::microbenchmark(f0(n), f1(n), f2(n), times = 10000L)
Unit: milliseconds
expr min lq mean median uq max neval
f0(n) 1.711873 1.797891 2.112043 1.810273 1.836636 14.96644 10000
f1(n) 1.986737 2.108630 2.472612 2.148195 2.214369 15.16282 10000
f2(n) 3.785981 4.624821 5.551115 5.051405 5.861954 17.28740 10000
(I'm aware that f1
is pretty close to f0
here, but is there something better than f1
?)
CodePudding user response:
I think sequence
is the one you are after (if you are not going to use Rcpp
for a even faster version)
f1 <- function(n) unlist(lapply(seq_len(n), seq_len), FALSE, FALSE)
f2 <- function(n) {
J <- .row(c(n, n))
J[upper.tri(J, TRUE)]
}
f3 <- function(n) {
v <- 1:n
data.table::rowid(rep.int(v, v))
}
f4 <- function(n) sequence(1:n)
n <- 1000L
microbenchmark::microbenchmark(f1(n), f2(n), f3(n), f4(n), check = "identical")
Benchmarking
> microbenchmark::microbenchmark(f1(n), f2(n), f3(n), f4(n), check = "identical")
Unit: microseconds
expr min lq mean median uq max neval
f1(n) 3928.8 4144.50 5185.839 4227.5 4289.15 67457.1 100
f2(n) 9490.3 10083.90 14415.777 12951.0 15080.50 78014.2 100
f3(n) 8083.5 8572.10 12154.922 9063.0 9534.45 75408.7 100
f4(n) 213.9 425.05 787.637 442.6 494.00 7844.4 100
CodePudding user response:
I'm not sure what you're aware of, but if function from base
is okay, try sequence
.
f3 <- function(n) {sequence(1:n)}
It seems it's almost 2~3 times faster than f0
CodePudding user response:
These 2 may also be options-
n <- 5
unlist(purrr::map(seq(5), ~seq(.x)))
#> [1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5
unlist(mapply(FUN = function(.x) seq(.x), seq(n)))
#> [1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5
Created on 2021-12-10 by the reprex package (v2.0.1)