Home > Enterprise >  Find occurrences of certain value in vector conditionally by the length of the occurrence & subset t
Find occurrences of certain value in vector conditionally by the length of the occurrence & subset t

Time:06-09

I have a following dummy vector:

dummy_vector <- c(1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,
                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

I would like to do the following:

  1. find the occurences of 1's if they are appearing more than 3 times in a row
  2. cut the part of the vector which would be flanking this ocurrence of 1's by 10 positions (values) to the left and 10 positions to the right from the center of this string of 1's - the resulting fragment should have fixed length of 20, containing 1's in the middle.
  3. If the ones would be at the beginning of the vector, the function should add NA's to the left side of the string.
  4. append the resulting vector fragments to a list.

And here is my desired output:

first <- c(NA,NA,NA,NA,NA,NA,NA,NA,1,1,1,1,0,0,0,0,0,0,0,0)
second <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0)


desired_output <- list(first, second)

First part I can do using filtering with rle() but I can't do the rest. Any help would be appreciated.

CodePudding user response:

You may try

fn <- function(x) {
  res <- list()
  y <- rle(x == 1)
  key <- which(y$lengths>3 & y$values)
  z <- y$lengths
  for (i in key){
    a = ceiling((20 - z[i])/2)
    b = z[i]
    c = floor((20-z[i])/2)
    if (i == 1){
      res[[match(i,key)]] <- c(rep(NA, a), rep(1, b), rep(0, c))
    } else {
      res[[match(i,key)]] <- c(rep(0, a), rep(1, b), rep(0, c))
    }
  }
  return(res)
}

fn(dummy_vector)

CodePudding user response:

You could use:

a <- paste0(dummy_vector, collapse = "") #create a string
b <- gregexpr("1{4,}", a)[[1]]           #obtain the positions with 4  1's
len <- attr(b, 'match.length')           #Obtain the current length
start <- b    floor(len/2) - 10          #extraction starting position
d <- sprintf(" s", substring(a, start,  start   19)) #Extract 20
lapply(strsplit(d, ''), as.numeric)     #Make numeric

[[1]]
 [1] NA NA NA NA NA NA NA  1  1  1  1  0  0  0  0  0  0  0   0

[[2]]
 [1] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 

CodePudding user response:

Using numerics:

a <- rle(dummy_vector)
condition <- a$lengths > 3 & a$values
b <- cumsum(c(1, head(a$lengths,-1)))[condition]
len <- a$lengths[condition]
start <-  b    floor(len/2) - 10
end <- start   19
Map(\(i, j)c(rep(NA, 20-length(x <- dummy_vector[i:j])),x),
    pmax(start, 1), pmin(end, length(dummy_vector)))

[[1]]
 [1] NA NA NA NA NA NA NA NA  1  1  1  1  0  0  0  0  0  0  0  0

[[2]]
 [1] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0
  • Related