Fastest way to construct the sequence `c(1:1, 1:2, ..., 1:n)`-CodePudding

For a given positive integer n, I want to know the fastest base R (not Rcpp) algorithm for constructing the integer vector c(1:1, 1:2, ..., 1:n), which has length n*(n 1)/2. There are bonus points for fast and memory-efficient algorithms, since I ultimately want to avoid allocating a vector of length n*n.

I'm aware of at least two approaches:

unlist(lapply(seq_len(n), seq_len), FALSE, FALSE)
{J <- .row(c(n, n)); J[upper.tri(J, TRUE)]}

the latter being particularly inefficient since it allocates two integer vectors of length n*n.

Note that if we assign the value .col(c(n, n)) to J above, then we obtain the sequence 1 2 2 3 3 3 4 4 4 4 .... This sequence can be constructed fast and efficiently with {i <- seq_len(n); rep.int(i, i)}.

I am wondering if a similarly fast (or faster) algorithm exists in the .row(c(n, n)) case, or if unlist-lapply is optimal from a base R standpoint.

FWIW, here is a benchmark of the three procedures I've mentioned so far:

## Seemingly optimal for 1 2 2 3 3 3 4 4 4 4 ...
f0 <- function(n) {i <- seq_len(n); rep.int(i, i)}
## Candidates for 1 1 2 1 2 3 1 2 3 4 ... (the sequence I actually want)
f1 <- function(n) unlist(lapply(seq_len(n), seq_len), FALSE, FALSE)
f2 <- function(n) {J <- .row(c(n, n)); J[upper.tri(J, TRUE)]}

n <- 1000L
microbenchmark::microbenchmark(f0(n), f1(n), f2(n), times = 10000L)

Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
 f0(n) 1.711873 1.797891 2.112043 1.810273 1.836636 14.96644 10000
 f1(n) 1.986737 2.108630 2.472612 2.148195 2.214369 15.16282 10000
 f2(n) 3.785981 4.624821 5.551115 5.051405 5.861954 17.28740 10000

(I'm aware that f1 is pretty close to f0 here, but is there something better than f1?)

CodePudding user response：

I'm not sure what you're aware of, but if function from base is okay, try sequence.

f3 <- function(n) {sequence(1:n)}

It seems it's almost 2~3 times faster than f0

CodePudding user response：

I think sequence is the one you are after (if you are not going to use Rcpp for a even faster version)

f1 <- function(n) unlist(lapply(seq_len(n), seq_len), FALSE, FALSE)
f2 <- function(n) {
  J <- .row(c(n, n))
  J[upper.tri(J, TRUE)]
}
f3 <- function(n) {
  v <- 1:n
  data.table::rowid(rep.int(v, v))
}
f4 <- function(n) sequence(1:n)

n <- 1000L
microbenchmark::microbenchmark(f1(n), f2(n), f3(n), f4(n), check = "identical")

Benchmarking

> microbenchmark::microbenchmark(f1(n), f2(n), f3(n), f4(n), check = "identical")
Unit: microseconds
  expr    min       lq      mean  median       uq     max neval
 f1(n) 3928.8  4144.50  5185.839  4227.5  4289.15 67457.1   100
 f2(n) 9490.3 10083.90 14415.777 12951.0 15080.50 78014.2   100
 f3(n) 8083.5  8572.10 12154.922  9063.0  9534.45 75408.7   100
 f4(n)  213.9   425.05   787.637   442.6   494.00  7844.4   100

CodePudding user response：

These 2 may also be options-

n <- 5

unlist(purrr::map(seq(5), ~seq(.x)))
#>  [1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5

unlist(mapply(FUN = function(.x) seq(.x), seq(n)))
#>  [1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5

^{Created on 2021-12-10 by the reprex package (v2.0.1)}