Home > database >  How to delete rows until a condition and keep subsequent ones until another condition
How to delete rows until a condition and keep subsequent ones until another condition

Time:08-26

I have a data set that looks like this:

id <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4)
year <- c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2015, 2016, 2017, 2019, 2020, 2021, 2022)
GP <- c(0,4,4,2,3,4,7,3,0,2,1,0,0,3)

df <- cbind(id, year, GP)
df
     id year GP
 [1,]  1 2013  0
 [2,]  1 2014  4
 [3,]  1 2015  4
 [4,]  2 2016  2
 [5,]  2 2017  3
 [6,]  2 2018  4
 [7,]  2 2019  7
 [8,]  3 2015  3
 [9,]  3 2016  0
[10,]  3 2017  2
[11,]  4 2019  1
[12,]  4 2020  0
[13,]  4 2021  0
[14,]  4 2022  3

GP refers to games played. What I want to do now is delete all rows before a person (id) has played, so before GP > 0 and then keep all observations in which the player plays continuously and keep the row in which the player has first stopped playing. My dataset should look then like this:

     id year GP
 [2,]  1 2014  4
 [3,]  1 2015  4
 [4,]  2 2016  2
 [5,]  2 2017  3
 [6,]  2 2018  4
 [7,]  2 2019  7
 [8,]  3 2015  3
 [9,]  3 2016  0
[11,]  4 2019  1
[12,]  4 2020  0

Hence, rows 1,10,13 and 14 are deleted. I was able to delete observations before a player has first started playing using:

df <- df %>%
  group_by(id) %>%
  filter(cumany(GP > 0)) %>%
  ungroup

But I am not able to obtain the second part. After executing the above code I tried to delete all observations in which GP = 0 and then add a row for each id at the bottom in which GP = 0 using complete from the tidy package, but without success.

CodePudding user response:

Perhaps add a slice afterwards

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(cumany(GP > 0)) %>% 
  slice(seq_len(match(0, GP, nomatch = n()))) %>% 
  ungroup

-output

# A tibble: 10 × 3
      id  year    GP
   <dbl> <dbl> <dbl>
 1     1  2014     4
 2     1  2015     4
 3     2  2016     2
 4     2  2017     3
 5     2  2018     4
 6     2  2019     7
 7     3  2015     3
 8     3  2016     0
 9     4  2019     1
10     4  2020     0

data

df <- data.frame(id, year, GP)
  •  Tags:  
  • r
  • Related