I am looking to extract timepoints from a table. Output should be the starting point in seconds from column 2 and the duration of the series. But output only if the stage lasts for at least 3 minutes ( if you look at the seconds column) so repetition of either stage 0,1,2,3 or 5 for more than 6 consecutive lines of the stage column.
So in this case the 0-series does not qualify, while the following 1-series does. desired output would be : 150, 8 starting at timepoint 150 and lasting for 8 rows.
I was experimenting with rle()
, but haven't been successful yet..
Stage | Seconds |
---|---|
0 | 0 |
0 | 30 |
0 | 60 |
0 | 90 |
0 | 120 |
1 | 150 |
1 | 180 |
1 | 210 |
1 | 240 |
1 | 270 |
1 | 300 |
1 | 330 |
1 | 360 |
1 | 390 |
0 | 420 |
CodePudding user response:
Similar to this answer, you can use data.table::rleid()
with dplyr
df <- structure(list(Stage = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L), Seconds = c(0L, 30L, 60L, 90L, 120L,
150L, 180L, 210L, 240L, 270L, 300L, 330L, 360L, 390L, 420L)), class = "data.frame", row.names = c(NA,
-15L))
library(dplyr)
library(data.table)
df %>%
filter(Seconds > 0) %>%
group_by(grp = rleid(Stage)) %>%
filter(n() > 6)
#> # A tibble: 9 x 3
#> # Groups: grp [1]
#> Stage Seconds grp
#> <int> <int> <int>
#> 1 1 150 2
#> 2 1 180 2
#> 3 1 210 2
#> 4 1 240 2
#> 5 1 270 2
#> 6 1 300 2
#> 7 1 330 2
#> 8 1 360 2
#> 9 1 390 2
Created on 2021-09-23 by the reprex package (v2.0.0)
CodePudding user response:
Not sure how representative of your data this might be. This may be an option using dplyr
library(dplyr)
df %>%
mutate(grp = c(0, cumsum(abs(diff(stage))))) %>%
filter(stage == 1) %>%
group_by(grp) %>%
mutate(count = n() - 1) %>%
filter(row_number() == 1, count >= 6) %>%
ungroup() %>%
select(-c(grp, stage))
#> # A tibble: 4 x 2
#> seconds count
#> <dbl> <dbl>
#> 1 960 16
#> 2 1500 7
#> 3 2040 17
#> 4 2670 10
Created on 2021-09-23 by the reprex package (v2.0.0) data
set.seed(123)
df <- data.frame(stage = sample(c(0, 1), 100, replace = TRUE, prob = c(0.2, 0.8)),
seconds = seq(0, by = 30, length.out = 100))