Creating a complex sequence in R which skips iterations number without using a loop-CodePudding

The following sequence:

1 2 3 1 2 3 1 2 3

can be generated with a loop

x <- c(); for (i in 1:3) x <- c(x, 1:3)

or (preferably) without a loop

x <- rep(1:3, 3)

Now, I want to remove i from the sequence in the i-th iteration, to get:

2 3 1 3 1 2

It is easy to achieve this by modifying the loop, but how to do this without a loop?

CodePudding user response：

This is an idea:

x <- rep(1:3, 3)
x[x != rep(1:3, each = 3)]

# [1] 2 3 1 3 1 2

CodePudding user response：

I thought about this before. I did:

n <- 3
x <- rep(1:n, n)[-seq.int(1, n * n, by = n   1)]
#[1] 2 3 1 3 1 2

Nothing will be faster than this for big n (of course, unless we code the entire loop in C/C ).

interpretation

It is as same as dropping diagonal elements from the following matrix:

matrix(rep(1:n, n), n)
#     [,1] [,2] [,3]
#[1,]    1    1    1
#[2,]    2    2    2
#[3,]    3    3    3

which is

matrix(x, ncol = n)
#     [,1] [,2] [,3]
#[1,]    2    1    1
#[2,]    3    3    2

This essentially gives the index for training data in a leave-one-out cross-validation.

benchmark

f1 <- function (n) rep.int(1:n, n)[-seq.int(1, n * n, n   1)]

## Darren Tsai's method
## the logic is also dropping diagonal elements from a matrix
## try mat[row(mat) != col(mat)] for a square matrix `mat`
## but this takes more memory
f2 <- function (n) {
  z <- 1:n
  x <- rep.int(z, n)
  x[x != rep(z, each = n)]
}

n <- 1000
library(microbenchmark)
microbenchmark("Li" = f1(n), "Tsai" = f2(n))
#Unit: milliseconds
# expr      min       lq     mean   median       uq       max
#   Li 15.14039 15.18756 19.06687 16.78678 20.44281  52.86618 
# Tsai 61.45718 62.56886 66.01448 62.86677 65.42081 107.46628

CodePudding user response：

A fast solution using sequence:

n <- 3L
sequence(rep(n, n - 1L), 1:n) %% n   1L
#> [1] 2 3 1 3 1 2

Benchmark (based on Zheyuan Li's):

f1 <- function (n) rep.int(1:n, n)[-seq.int(1, n * n, n   1)]
f2 <- function (n) {
  z <- 1:n
  x <- rep.int(z, n)
  x[x != rep(z, each = n)]
}
f3 <- function(n) sequence(rep(n, n - 1L), 1:n) %% n   1L
n <- 1000L
microbenchmark::microbenchmark(Li = f1(n),
                               Tsai = f2(n),
                               Blood = f3(n),
                               check = "equal")
#> Unit: milliseconds
#>   expr     min      lq      mean   median      uq     max neval
#>     Li  6.5381  6.7037  8.545134  8.53075  9.0220 32.4803   100
#>   Tsai 11.7392 12.0008 14.599620 14.00125 14.4217 43.2622   100
#>  Blood  3.0204  3.0617  3.514819  3.09375  3.2193  7.8742   100