How to generate all possible consecutive n-tuples of a vector in R?
# Input
x <- c('a', 'b', 'c', 'b')
n = 2
# Output
list(c('a', 'b'), c('b', 'c'), c('c', 'b')) # a list equal to this list
CodePudding user response:
We may remove the first and last elements and concatenate by looping over the corresponding elements with Map
Map(c, x[-length(x)], x[-1])
$a
[1] "a" "b"
$b
[1] "b" "c"
$c
[1] "c" "b"
Or cbind
to a matrix
and split by row with asplit
asplit(cbind(x[-length(x)], x[-1]), 1)
[[1]]
[1] "a" "b"
[[2]]
[1] "b" "c"
[[3]]
[1] "c" "b"
If the n
values can be more than 2, we may also do this with shift
library(data.table)
Filter(\(x) all(complete.cases(x)),
data.table::transpose(shift(x, seq_len(n)-1, type = 'lead')))
[[1]]
[1] "a" "b"
[[2]]
[1] "b" "c"
[[3]]
[1] "c" "b"
CodePudding user response:
Vectorized Base R solution (I don't use embed
or asplit
; they have a for
-loop inside).
foo <- function (x, n = 2, format = "matrix") {
m <- length(x) - n 1
y <- x[sequence(rep(m, n), 1:n)]
if (format == "matrix") matrix(y, ncol = n)
else if (format == "list") split(y, 1:m)
else stop("unknown format!")
}
foo(x, 2, "matrix")
# [,1] [,2]
#[1,] "a" "b"
#[2,] "b" "c"
#[3,] "c" "b"
foo(x, 3, "matrix")
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "b" "c" "b"
foo(x, 2, "list")
#$`1`
#[1] "a" "b"
#
#$`2`
#[1] "b" "c"
#
#$`3`
#[1] "c" "b"
foo(x, 3, "list")
#$`1`
#[1] "a" "b" "c"
#
#$`2`
#[1] "b" "c" "b"
Could you simplify the function please? Remove
format
.
A straightforward one line:
split(x[sequence(rep(length(x) - n 1, n), 1:n)], seq_len(length(x) - n 1))
CodePudding user response:
lapply(1:(length(x) - n 1), \(i) x[i:(i n - 1)])
CodePudding user response:
Here are some fun.
fun1 <- function (x, n) asplit(embed(x, n)[, n:1], 1)
fun2 <- function (x, n) split(x[sequence(rep(length(x) - n 1, n), 1:n)], seq_len(length(x) - n 1))
fun3 <- function (x, n) lapply(1:(length(x) - n 1), \(i) x[i:(i n - 1)])
library(microbenchmark)
x <- 1:10000
microbenchmark("for" = fun1(x, 2), "split" = fun2(x, 2), "lapply" = fun3(x, 2))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# for 30.536090 39.196876 49.400427 48.541195 55.481533 107.46441 100 c
# split 6.453484 7.049844 7.765709 7.647299 7.904683 13.63022 100 a
# lapply 16.070532 21.959815 26.988959 28.482102 31.133325 45.47318 100 b
microbenchmark("for" = fun1(x, 10), "split" = fun2(x, 10), "lapply" = fun3(x, 10))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# for 34.115408 34.826142 39.136366 35.631689 37.200893 200.63875 100 c
# split 8.566762 8.780026 9.255456 9.057524 9.641736 12.67383 100 a
# lapply 17.343556 17.845281 19.289687 18.301174 18.833777 28.19920 100 b
microbenchmark("for" = fun1(x, 20), "split" = fun2(x, 20), "lapply" = fun3(x, 20))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# for 38.33747 38.90368 40.61395 39.72388 40.64009 51.51035 100 c
# split 11.29013 11.39768 12.07148 11.48208 12.13088 17.46919 100 a
# lapply 18.77825 18.94005 20.88440 19.33751 19.93676 42.35469 100 b