Home > Back-end >  How to count sequential repeats of a value (allowing for one skip)
How to count sequential repeats of a value (allowing for one skip)

Time:02-11

I am trying to produce a variable that counts how many times a "1" appeared sequentially in the preceding rows for a different variable. However, I need the count to persist even if there is one row missing a 1. (i.e., 10111011 should register as an 8). The code I use to count sequential 1s is:

The following code provides an example of the kind of thing I'm trying to do:

input <- c(1,0,1,1,0,1,1,0,1,0,1)
dfseq <- data.frame(input)
dfseq$seq <- sequence(rle(as.character(dfseq$input))$lengths)

which produces the following dataframe:

data_struc <-
  structure(list(
    input = c(1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1),
    seq = c(1L,
            1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L)
  ),
  row.names = c(NA,-11L),
  class = "data.frame")

However, I want the sequence to allow for one row of "failure" on the sequence, such that it continues to count consecutive ones even if one row contains a 0 and then the 1s continue. It should only stop counting once two 0s appear consecutively

CodePudding user response:

I'd use a lagged variable with an OR condition:

library(dplyr)
dfseq %>% mutate(
  cum_result = cumsum(input == 1 | (lag(input) == 1 & lead(input, default = 1) == 1))
)
#    input seq cum_result
# 1      1   1          1
# 2      0   1          2
# 3      1   1          3
# 4      1   2          4
# 5      0   1          5
# 6      1   1          6
# 7      1   2          7
# 8      0   1          8
# 9      1   1          9
# 10     0   1         10
# 11     1   1         11

CodePudding user response:

You were on the right track using rle. Using an extended dataset to illustrate the "allowing" part

rle_obj <- rle(dfseq$input)

sum(dfseq$input)   sum(ifelse(rle_obj$lengths[rle_obj$values==0]==1,1,0))
[1] 12

Data

dfseq <- structure(list(input = c(1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 
0, 1), seq = c(1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1)), row.names = c(NA, 
-14L), class = "data.frame")

dfseq
   input seq
1      1   1
2      0   1
3      1   1
4      1   2
5      0   1
6      1   1
7      1   2
8      0   1
9      1   1
10     0   1
11     1   1
12     0   1
13     0   2
14     1   1

CodePudding user response:

Not sure if we all understood the question right and the sample data clarifies not much and blurs any possible mistakes as it all follows the sequence according to OP. To be sure OP could provide a desired outcome and one based on a sample set that includes records that would break the sequence according to his criteria.

I changed the sample data a bit and this is how I interpretted the question.

dt <- data.frame(
  input = c(1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1)
)

library(data.table)
setDT(dt)

dt[, seq_count := 1:.N, by = rleid(input == 1 | (lag(input) == 1 & lead(input) != 0))]
dt[input == 0 & (lead(input) == 0 | lag(input) == 0), seq_count := NA]

dt

#     input seq_count
#  1:     1         1
#  2:     0         2
#  3:     1         3
#  4:     1         4
#  5:     0        NA
#  6:     0        NA
#  7:     1         1
#  8:     0         2
#  9:     1         3
# 10:     1         4
# 11:     0         5
# 12:     1         6
  • Related