Home > Blockchain >  Skipping number of rows after a certain value in R
Skipping number of rows after a certain value in R

Time:11-03

I have a data looks like below, I would like to skip 2 rows after max index of certain types (3 and 4). For example, I have two 4s in my table, but I only need to remove 2 rows after the second 4. Same for 3, I only need to remove 2 rows after the third 3.

-----------------
|  grade | type |
-----------------
|   93   |   2  |
-----------------
|   90   |   2  |
-----------------
|   54   |   2  |
-----------------
|   36   |   4  |
-----------------
|   31   |   4  |
-----------------
|   94   |   1  |
-----------------
|   57   |   1  |
-----------------
|   16   |   3  |
-----------------
|   11   |   3  |
-----------------
|   12   |   3  |
-----------------
|   99   |   1  |
-----------------
|   99   |   1  |
-----------------

The desired output would be:

-----------------
|  grade | type |
-----------------
|   93   |   2  |
-----------------
|   90   |   2  |
-----------------
|   54   |   2  |
-----------------
|   36   |   4  |
-----------------
|   31   |   4  |
-----------------
|   16   |   3  |
-----------------
|   11   |   3  |
-----------------
|   12   |   3  |
-----------------

Here is the code of my example:

data <- data.frame(grade = c(93,90,54,36,31,94,57,16,11,12,99,99), type = c(2,2,2,4,4,1,1,3,3,3,1,1))

Could anyone give me some hints on how to approach this in R? Thanks a bunch in advance for your help and your time!

CodePudding user response:

library(dplyr)
data.frame(grade = c(93, 90, 54, 36, 31, 94, 57, 16, 11, 12, 99, 99), 
           type = c(2, 2, 2, 4, 4, 1, 1, 3, 3, 3, 1, 1)) %>%
  mutate(large_shadow = slider::slide_dbl(type, ~sum(.x >= 3), .before = 2, .after = -2)) %>% 
  filter(large_shadow < 1)

  grade type large_shadow
1    93    2            0
2    90    2            0
3    54    2            0
4    36    4            0
5    31    4            0
6    16    3            0
7    11    3            0

Or a base R approach:

df$large = df$type >= 3
df$shadow = c(0,0,df$large[1:(nrow(df)-2)])
df <- df[df$shadow == 0, 1:2]

CodePudding user response:

Using some indexing:

data[-(nrow(data) - match(c(3,4), rev(data$type))   1   rep(1:2, each=2)),]
#   grade type
#1     93    2
#2     90    2
#3     54    2
#4     36    4
#5     31    4
#8     16    3
#9     11    3
#10    12    3

Or more generically:

vals <- c(3,4)
data[-(nrow(data) - match(vals, rev(data$type))   1   rep(1:2, each=length(vals))),]

The logic is to match the first instance of each value to the reversed values in the column, then spin that around to give the original row index, then add 1 and 2 to the row indexes, then drop these rows.

CodePudding user response:

data[-c(max(which(data$type==3)) 1:2,max(which(data$type==4)) 1:2),]

#    grade type
# 1     93    2
# 2     90    2
# 3     54    2
# 4     36    4
# 5     31    4
# 8     16    3
# 9     11    3
# 10    12    3

CodePudding user response:

Similar to Ric, but I find it a bit easier to read (way more verbose, though):

idx = data %>% mutate(id = row_number()) %>%
filter(type %in% 3:4) %>% group_by(type) %>% filter(id == max(id)) %>% pull(id)
data[-c(idx   1, idx   2),]
  •  Tags:  
  • r
  • Related