Home > OS >  Filter based on row repetition of a range of values
Filter based on row repetition of a range of values

Time:08-25

I have a data.frame of dates and corresponding values.

date <- Sys.Date()   sort(sample(1:26, 26))

values <- c(sample(400:415, 10, replace = TRUE), 
            420, 421, 422, 420, 419, 421, 
            sample(430:435, 10, replace = TRUE))

df <- data.frame(date, values)

I want to filter the data to rows where the range 420±2 is repeated at least five times when the date is in ascending order. The final version of df would only contain rows with values of 420, 421, 422, 420, 419, 421, as the sample() data would be removed.

I am looking for both dplyr and data.table solutions.

CodePudding user response:

Here is one option with dplyr

library(dplyr)
library(data.table)
df %>% 
  mutate(values2 = between(values, 420-2, 420   2)) %>% 
  group_by(grp = rleid(values2)) %>% 
  filter(n() > 5, all(values2)) %>%
  ungroup %>%
  select(-values2, -grp)

-output

# A tibble: 6 × 2
  date       values
  <date>      <dbl>
1 2022-09-04    420
2 2022-09-05    421
3 2022-09-06    422
4 2022-09-07    420
5 2022-09-08    419
6 2022-09-09    421

Or using base R with rle

subset(df, inverse.rle(within.list(rle(values >= (420-2) &
   values <= (420   2)), {values[values & lengths < 5] <- FALSE})))
     date values
11 2022-09-04    420
12 2022-09-05    421
13 2022-09-06    422
14 2022-09-07    420
15 2022-09-08    419
16 2022-09-09    421
  • Related