I have a data set that looks like this:
id <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4)
year <- c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2015, 2016, 2017, 2019, 2020, 2021, 2022)
GP <- c(0,4,4,2,3,4,7,3,0,2,1,0,0,3)
df <- cbind(id, year, GP)
df
id year GP
[1,] 1 2013 0
[2,] 1 2014 4
[3,] 1 2015 4
[4,] 2 2016 2
[5,] 2 2017 3
[6,] 2 2018 4
[7,] 2 2019 7
[8,] 3 2015 3
[9,] 3 2016 0
[10,] 3 2017 2
[11,] 4 2019 1
[12,] 4 2020 0
[13,] 4 2021 0
[14,] 4 2022 3
GP refers to games played. What I want to do now is delete all rows before a person (id) has played, so before GP > 0
and then keep all observations in which the player plays continuously and keep the row in which the player has first stopped playing. My dataset should look then like this:
id year GP
[2,] 1 2014 4
[3,] 1 2015 4
[4,] 2 2016 2
[5,] 2 2017 3
[6,] 2 2018 4
[7,] 2 2019 7
[8,] 3 2015 3
[9,] 3 2016 0
[11,] 4 2019 1
[12,] 4 2020 0
Hence, rows 1,10,13 and 14 are deleted. I was able to delete observations before a player has first started playing using:
df <- df %>%
group_by(id) %>%
filter(cumany(GP > 0)) %>%
ungroup
But I am not able to obtain the second part. After executing the above code I tried to delete all observations in which GP = 0
and then add a row for each id at the bottom in which GP = 0
using complete
from the tidy
package, but without success.
CodePudding user response:
Perhaps add a slice
afterwards
library(dplyr)
df %>%
group_by(id) %>%
filter(cumany(GP > 0)) %>%
slice(seq_len(match(0, GP, nomatch = n()))) %>%
ungroup
-output
# A tibble: 10 × 3
id year GP
<dbl> <dbl> <dbl>
1 1 2014 4
2 1 2015 4
3 2 2016 2
4 2 2017 3
5 2 2018 4
6 2 2019 7
7 3 2015 3
8 3 2016 0
9 4 2019 1
10 4 2020 0
data
df <- data.frame(id, year, GP)