Is there an R function to extract repeating rows of numbers?-CodePudding

I am looking to extract timepoints from a table. Output should be the starting point in seconds from column 2 and the duration of the series. But output only if the stage lasts for at least 3 minutes ( if you look at the seconds column) so repetition of either stage 0,1,2,3 or 5 for more than 6 consecutive lines of the stage column.

So in this case the 0-series does not qualify, while the following 1-series does. desired output would be : 150, 8 starting at timepoint 150 and lasting for 8 rows.

I was experimenting with rle(), but haven't been successful yet..

Stage	Seconds
0	0
0	30
0	60
0	90
0	120
1	150
1	180
1	210
1	240
1	270
1	300
1	330
1	360
1	390
0	420

CodePudding user response：

Similar to this answer, you can use data.table::rleid() with dplyr

df <- structure(list(Stage = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
                               1L, 1L, 1L, 1L, 1L, 0L), Seconds = c(0L, 30L, 60L, 90L, 120L, 
                                                              150L, 180L, 210L, 240L, 270L, 300L, 330L, 360L, 390L, 420L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                -15L))
library(dplyr)
library(data.table)


df %>%
  filter(Seconds > 0) %>%
  group_by(grp = rleid(Stage)) %>%
  filter(n() > 6)
#> # A tibble: 9 x 3
#> # Groups:   grp [1]
#>   Stage Seconds   grp
#>   <int>   <int> <int>
#> 1     1     150     2
#> 2     1     180     2
#> 3     1     210     2
#> 4     1     240     2
#> 5     1     270     2
#> 6     1     300     2
#> 7     1     330     2
#> 8     1     360     2
#> 9     1     390     2

^{Created on 2021-09-23 by the reprex package (v2.0.0)}

CodePudding user response：

Not sure how representative of your data this might be. This may be an option using dplyr

library(dplyr)

df %>% 
  mutate(grp = c(0, cumsum(abs(diff(stage))))) %>% 
  filter(stage == 1) %>% 
  group_by(grp) %>% 
  mutate(count = n() - 1) %>% 
  filter(row_number() == 1, count >= 6) %>% 
  ungroup() %>% 
  select(-c(grp, stage))

#> # A tibble: 4 x 2
#>   seconds count
#>     <dbl> <dbl>
#> 1     960    16
#> 2    1500     7
#> 3    2040    17
#> 4    2670    10

^{Created on 2021-09-23 by the reprex package (v2.0.0)} data

set.seed(123)

df <- data.frame(stage = sample(c(0, 1), 100, replace = TRUE, prob = c(0.2, 0.8)),
                 seconds = seq(0, by = 30, length.out = 100))