I have a data.frame of dates and corresponding values.
date <- Sys.Date() sort(sample(1:26, 26))
values <- c(sample(400:415, 10, replace = TRUE),
420, 421, 422, 420, 419, 421,
sample(430:435, 10, replace = TRUE))
df <- data.frame(date, values)
I want to filter the data to rows where the range 420±2 is repeated at least five times when the date
is in ascending order. The final version of df
would only contain rows with values
of 420, 421, 422, 420, 419, 421
, as the sample()
data would be removed.
I am looking for both dplyr
and data.table
solutions.
CodePudding user response:
Here is one option with dplyr
library(dplyr)
library(data.table)
df %>%
mutate(values2 = between(values, 420-2, 420 2)) %>%
group_by(grp = rleid(values2)) %>%
filter(n() > 5, all(values2)) %>%
ungroup %>%
select(-values2, -grp)
-output
# A tibble: 6 × 2
date values
<date> <dbl>
1 2022-09-04 420
2 2022-09-05 421
3 2022-09-06 422
4 2022-09-07 420
5 2022-09-08 419
6 2022-09-09 421
Or using base R
with rle
subset(df, inverse.rle(within.list(rle(values >= (420-2) &
values <= (420 2)), {values[values & lengths < 5] <- FALSE})))
date values
11 2022-09-04 420
12 2022-09-05 421
13 2022-09-06 422
14 2022-09-07 420
15 2022-09-08 419
16 2022-09-09 421