Is there a way to delete rows in R that do not follow a sequence?-CodePudding

I have a dataframe in R and need to remove rows that do not follow an expected sequence in a column. A shortened version of my dataframe is as follows:

splits_level <- structure(list(name = c("1", "2", "3", "1", "2", "3", "1", "2", 
"3", "1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3", 
"1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3", "4", 
"1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3", "4", "1", 
"2", "3", "4"), value = c(NA, NA, NA, "5", "4", "3", "00:01:35.780", 
"00:03:12.220", "00:04:50.010", NA, NA, NA, "d500m", "d1000m", 
"d1500m", "7cc15908-19a4-4e71-aa7a-8381000f47b5", "53b98dcd-f995-45a3-8803-395cdaedb4c2", 
"8aedc73c-1780-4dc8-a2f8-4179c16e7b49", "7cc15908-19a4-4e71-aa7a-8381000f47b5", 
"53b98dcd-f995-45a3-8803-395cdaedb4c2", "8aedc73c-1780-4dc8-a2f8-4179c16e7b49", 
"31f1f791-977a-497d-9f38-540f66e54040", "58b439af-8221-43d2-81cd-9b21455441c1", 
"c98a8ecc-9a58-40b1-8077-94df26507807", "40a17577-c7fd-4a69-b2a7-a95e28a186e6", 
"40a17577-c7fd-4a69-b2a7-a95e28a186e6", "40a17577-c7fd-4a69-b2a7-a95e28a186e6", 
"02c324d6-ec9f-4920-aeae-1416ae509f5f", "37f3526b-6ff9-495d-b8d3-5224330635fc", 
"a0dfc090-93ab-443b-b764-9b596cace54f", NA, NA, NA, NA, "6", 
"5", "2", "1", "00:01:35.930", "00:03:12.630", "00:04:49.950", 
"00:06:27.120", NA, NA, NA, NA, "d500m", "d1000m", "d1500m", 
"d2000m")), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"))

I would like to remove the rows that do not follow the sequence "1, 2, 3, 4" in the name column - in this example it would be the first 30 rows, but it may not necessarily always be the first 30. They could be in the middle of the df etc.

I am new to R and stuck with how to achieve this. Any help would be greatly appreciated! Thanks!

CodePudding user response：

Perhaps this?

vec <- as.character(1:4)
splits_level %>%
  group_by(grp = cumsum(name == "1")) %>%
  dplyr::filter(n() == length(vec) && all(name == vec)) %>%
  ungroup() %>%
  select(-grp)
# # A tibble: 20 × 2
#    name  value       
#    <chr> <chr>       
#  1 1     NA          
#  2 2     NA          
#  3 3     NA          
#  4 4     NA          
#  5 1     6           
#  6 2     5           
#  7 3     2           
#  8 4     1           
#  9 1     00:01:35.930
# 10 2     00:03:12.630
# 11 3     00:04:49.950
# 12 4     00:06:27.120
# 13 1     NA          
# 14 2     NA          
# 15 3     NA          
# 16 4     NA          
# 17 1     d500m       
# 18 2     d1000m      
# 19 3     d1500m      
# 20 4     d2000m

CodePudding user response：

In base R, you could identify the positions where names == 4, then Vectorize a user defined function that sequences the row numbers. Then simply index those row numbers in your original data using which:

vecfun <- Vectorize(function(x){
  x <- as.numeric(x)
  rev(seq(x, x-3, -1))
})

splits_level[as.vector(vecfun(which(splits_level$name %in% "4"))),]

Output:

# <chr> <chr>       
#  1 1     NA          
#  2 2     NA          
#  3 3     NA          
#  4 4     NA          
#  5 1     6           
#  6 2     5           
#  7 3     2           
#  8 4     1           
#  9 1     00:01:35.930
# 10 2     00:03:12.630
# 11 3     00:04:49.950
# 12 4     00:06:27.120
# 13 1     NA          
# 14 2     NA          
# 15 3     NA          
# 16 4     NA          
# 17 1     d500m       
# 18 2     d1000m      
# 19 3     d1500m      
# 20 4     d2000m