Home > Software engineering >  Dplyr sequence matching
Dplyr sequence matching

Time:11-24

I am struggling to write a simple dplyr code for this problem. If values per id are equal and timeslot follows a consecutive and increasing sequence per day I would like to create a t column that is a count of the sequence length. For example, in the case of id 1 there is an increasing consecutive sequence on day 1 starting from timeslot 1 till timeslot 4, so t would be equal 4. There are 144 timeslots and 7 days. How can I do this?

Desired output:

enter image description here

Sample data:



    structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
    variable = c("ha1_001", "ha1_002", "ha1_003", "ha1_004", 
    "ha1_125", "ha1_126", "ha1_127", "ha1_128", "ha1_009", "ha1_010", 
    "ha1_011", "ha1_012", "ha1_013"), value = c(110, 110, 110, 
    110, 110, 110, 110, 110, 110, 110, 110, 110, 110), day = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), timeslot = c(1, 2, 3, 
    4, 125, 126, 127, 128, 129, 130, 131, 132, 133), n = c(7, 
    7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -13L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_double", 
    "collector")), variable = structure(list(), class = c("collector_character", 
    "collector")), value = structure(list(), class = c("collector_double", 
    "collector")), day = structure(list(), class = c("collector_double", 
    "collector")), timeslot = structure(list(), class = c("collector_double", 
    "collector")), n = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

CodePudding user response:

One possible solution can be,

library(dplyr)

df %>% 
 group_by(id, new = cumsum(c(1, diff(timeslot)) != 1)) %>% 
 mutate(t = n())

# A tibble: 13 x 7
# Groups:   new [2]
      id variable value   day timeslot     n     t
   <dbl> <chr>    <dbl> <dbl>    <dbl> <dbl> <int>
 1     1 ha1_001    110     1        1     7     4
 2     1 ha1_002    110     1        2     7     4
 3     1 ha1_003    110     1        3     7     4
 4     1 ha1_004    110     1        4     7     4
 5     1 ha1_125    110     1      125     7     9
 6     1 ha1_126    110     1      126     7     9
 7     1 ha1_127    110     1      127     7     9
 8     1 ha1_128    110     1      128     7     9
 9     1 ha1_009    110     1      129     7     9
10     1 ha1_010    110     1      130     7     9
11     1 ha1_011    110     1      131     7     9
12     1 ha1_012    110     1      132     7     9
13     1 ha1_013    110     1      133     7     9
  • Related