I am trying to skip a certain number of rows after a max index of each class section. My dataset looks like as following, I would like to skip 2 rows after the max index of class section where class = 4 or 2. As we can see from the dataset that there are two sections of 4s, I would like to remove 2 rows after max index in both sections. As well, there is one section of 2s so I would like to remove 2 rows after max index. However, this is just an example, there could be multiple sections of 4s or 2s.
------------------
| rate | class |
------------------
| 0.5 | 9 |
------------------
| 0.7 | 9 |
------------------
| 0.6 | 4 |
------------------
| 0.5 | 4 |
------------------
| 0.3 | 4 |
------------------
| 0.9 | 4 |
------------------
| 0.8 | 1 |
------------------
| 0.6 | 1 |
------------------
| 0.3 | 1 |
------------------
| 0.3 | 1 |
------------------
| 0.2 | 4 |
------------------
| 0.1 | 4 |
------------------
| 0.2 | 3 |
------------------
| 0.4 | 3 |
------------------
| 0.9 | 3 |
------------------
| 1.0 | 2 |
------------------
| 0.7 | 2 |
------------------
| 0.8 | 1 |
------------------
| 0.9 | 1 |
------------------
| 0.6 | 9 |
------------------
The desired output would look like as below:
------------------
| rate | class |
------------------
| 0.5 | 9 |
------------------
| 0.7 | 9 |
------------------
| 0.6 | 4 |
------------------
| 0.5 | 4 |
------------------
| 0.3 | 4 |
------------------
| 0.9 | 4 |
------------------
| 0.8 | 1 |
------------------
| 0.6 | 1 |
------------------
| 0.2 | 4 |
------------------
| 0.1 | 4 |
------------------
| 0.2 | 3 |
------------------
| 1.0 | 2 |
------------------
| 0.7 | 2 |
------------------
| 0.6 | 9 |
------------------
Here is the code for my dataset:
dd <- data.frame(rate = c(0.5,0.7,0.6,0.5,0.3,0.9,0.8,0.6,0.3,0.3,0.2,0.1,0.2,0.4,0.9,1.0,0.7,0.8,0.9,0.6),
class = c(9,9,4,4,4,4,1,1,1,1,4,4,3,3,3,2,2,1,1,9))
I really appreciate your time and effort!
CodePudding user response:
library(dplyr)
dd %>%
filter(
!(class != 4 &
(
lag(class, n = 1, default = 0) == 4 |
lag(class, n = 2, default = 0) == 4
)
)
)
# rate class
# 1 0.5 9
# 2 0.7 9
# 3 0.6 4
# 4 0.5 4
# 5 0.3 4
# 6 0.9 4
# 7 0.3 1
# 8 0.3 1
# 9 0.2 4
# 10 0.1 4
# 11 0.9 3
# 12 1.0 2
# 13 0.7 2
# 14 0.8 1
# 15 0.9 1
# 16 0.6 9
We filter out rows where the class
is not 4 AND (the previous row is 4 OR the 2nd previous row is 4). Setting the default for the lags to not be 4 so the first rows are not filtered out.
CodePudding user response:
Run-length encoding to find the end points of each run of 2
or 4
, followed by removing the 2 rows after each endpoint:
endpts <- with(rle(dd$class), cumsum(lengths)[values %in% c(2,4)] )
dd[-(endpts rep(seq(2), each=length(endpts))),]
# rate class
#1 0.5 9
#2 0.7 9
#3 0.6 4
#4 0.5 4
#5 0.3 4
#6 0.9 4
#9 0.3 1
#10 0.3 1
#11 0.2 4
#12 0.1 4
#15 0.9 3
#16 1.0 2
#17 0.7 2
#20 0.6 9