I have a couple of sequences which I want to break into series of adjacent numbers. The sequences are nested within a list of individuals such that the size of the window that contains the adjacent numbers varies from one individual to another. Here are some example data:
#The sequences of three individuals
sequences <- list(c(1,2,3,5,6), c(2,3,4,5,6), c(1,3,4,6,7))
#The window size that contains the adjacent numbers
#for the first individual, 2 adjacent numbers should be bonded together and for the second, 3 should be bonded, etc.
windowsize <- list(2,3,4)
#The breakdown of the adjacent numbers should look like:
[[1]]
[[1]][[1]]
[1] 1 2
[[1]][[2]]
[1] 2 3
[[1]][[3]]
[1] 3 5
[[1]][[4]]
[1] 5 6
[[2]]
[[2]][[1]]
[1] 2 3 4
[[2]][[2]]
[1] 3 4 5
[[2]][[3]]
[1] 4 5 6
[[3]]
[[3]][[1]]
[1] 1 3 4 6
[[3]][[2]]
[1] 3 4 6 7
I have a much larger dataset than this and so I am thinking maybe writing a function will be the way to achieve this? Thank you!
CodePudding user response:
We may use Map
with embed
from base R
- loop over the corresponding elements of 'sequences', 'windowsize' in Map
, create a matrix with embed
with dimension
specified as the element (y
) from 'windowsize' and use asplit
to split by row (MARGIN = 1
)
Map(function(x, y) asplit(embed(x, y)[, y:1], 1), sequences, windowsize)
-output
[[1]]
[[1]][[1]]
[1] 1 2
[[1]][[2]]
[1] 2 3
[[1]][[3]]
[1] 3 5
[[1]][[4]]
[1] 5 6
[[2]]
[[2]][[1]]
[1] 2 3 4
[[2]][[2]]
[1] 3 4 5
[[2]][[3]]
[1] 4 5 6
[[3]]
[[3]][[1]]
[1] 1 3 4 6
[[3]][[2]]
[1] 3 4 6 7
If we want a matrix
, just remove the asplit
Map(function(x, y) embed(x, y)[, y:1], sequences, windowsize)
[1]]
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 5
[4,] 5 6
[[2]]
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 4 5 6
[[3]]
[,1] [,2] [,3] [,4]
[1,] 1 3 4 6
[2,] 3 4 6 7
CodePudding user response:
We don't really need to create a list of list here since the sublists are all rectangular. Suggest creating a list of matrices instead. We used sequences and windowsize from the question but since windowsize could be represented by a numeric vector that probably makes more sense. If you want lists anyways then use the commented out f instead.
library(zoo)
# split2 <- function(x) split(x, 1:nrow(x))
# f <- function(x, w) split2(rollapply(x, w, c))
f <- function(x, w) rollapply(x, w, c)
Map(f, sequences, windowsize)
giving:
[[1]]
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 5
[4,] 5 6
[[2]]
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 4 5 6
[[3]]
[,1] [,2] [,3] [,4]
[1,] 1 3 4 6
[2,] 3 4 6 7