How to skip a certain number of rows after a max index in R-CodePudding

I am trying to skip a certain number of rows after a max index of each class section. My dataset looks like as following, I would like to skip 2 rows after the max index of class section where class = 4 or 2. As we can see from the dataset that there are two sections of 4s, I would like to remove 2 rows after max index in both sections. As well, there is one section of 2s so I would like to remove 2 rows after max index. However, this is just an example, there could be multiple sections of 4s or 2s.

------------------
|  rate   | class |
------------------
|   0.5   |   9   |
------------------
|   0.7   |   9   |
------------------
|   0.6   |   4   |
------------------
|   0.5   |   4   |
------------------
|   0.3   |   4   |
------------------
|   0.9   |   4   |
------------------
|   0.8   |   1   |
------------------
|   0.6   |   1   |
------------------
|   0.3   |   1   |
------------------
|   0.3   |   1   |
------------------
|   0.2   |   4   |
------------------
|   0.1   |   4   |
------------------
|   0.2   |   3   |
------------------
|   0.4   |   3   |
------------------
|   0.9   |   3   |
------------------
|   1.0   |   2   |
------------------
|   0.7   |   2   |
------------------
|   0.8   |   1   |
------------------
|   0.9   |   1   |
------------------
|   0.6   |   9   |
------------------

The desired output would look like as below:

------------------
|  rate   | class |
------------------
|   0.5   |   9   |
------------------
|   0.7   |   9   |
------------------
|   0.6   |   4   |
------------------
|   0.5   |   4   |
------------------
|   0.3   |   4   |
------------------
|   0.9   |   4   |
------------------
|   0.8   |   1   |
------------------
|   0.6   |   1   |
------------------
|   0.2   |   4   |
------------------
|   0.1   |   4   |
------------------
|   0.2   |   3   |
------------------
|   1.0   |   2   |
------------------
|   0.7   |   2   |
------------------
|   0.6   |   9   |
------------------

Here is the code for my dataset:

dd <- data.frame(rate = c(0.5,0.7,0.6,0.5,0.3,0.9,0.8,0.6,0.3,0.3,0.2,0.1,0.2,0.4,0.9,1.0,0.7,0.8,0.9,0.6),
                 class = c(9,9,4,4,4,4,1,1,1,1,4,4,3,3,3,2,2,1,1,9))

I really appreciate your time and effort!

CodePudding user response：

library(dplyr)
dd %>%
  filter(
    !(class != 4 &
     (
       lag(class, n = 1, default = 0) == 4 |
         lag(class, n = 2, default = 0) == 4
       )
    )
  )
#    rate class
# 1   0.5     9
# 2   0.7     9
# 3   0.6     4
# 4   0.5     4
# 5   0.3     4
# 6   0.9     4
# 7   0.3     1
# 8   0.3     1
# 9   0.2     4
# 10  0.1     4
# 11  0.9     3
# 12  1.0     2
# 13  0.7     2
# 14  0.8     1
# 15  0.9     1
# 16  0.6     9

We filter out rows where the class is not 4 AND (the previous row is 4 OR the 2nd previous row is 4). Setting the default for the lags to not be 4 so the first rows are not filtered out.

CodePudding user response：

Run-length encoding to find the end points of each run of 2 or 4, followed by removing the 2 rows after each endpoint:

endpts <- with(rle(dd$class), cumsum(lengths)[values %in% c(2,4)] )
dd[-(endpts   rep(seq(2), each=length(endpts))),]
#   rate class
#1   0.5     9
#2   0.7     9
#3   0.6     4
#4   0.5     4
#5   0.3     4
#6   0.9     4
#9   0.3     1
#10  0.3     1
#11  0.2     4
#12  0.1     4
#15  0.9     3
#16  1.0     2
#17  0.7     2
#20  0.6     9