I am trying to reshape my data by generating a dummy variable and deleting some observations. It has the following form:
> df
id GP
1 1 0
2 1 3
3 1 4
4 2 0
5 2 3
6 2 0
7 3 2
8 3 2
9 4 5
10 4 0
>
GP refers to games played. Now I want to generate a dummy variable lets call it entry
that shows when a player first started playing and then I want to delete observations after the player has first entered. My final dataset should look like this:
> df
id GP entry
1 1 0 0
2 1 3 1
3 2 0 1
4 2 3 1
5 3 2 1
6 4 5 1
>
Hence, rows number 3, 6, 8 and 10 of the original dataset were deleted. I have tried generating a a dummy variable and then deleting rows:
df$entry <- ifelse(df$GP > 0, 1, 0)
for (i in 1:nrow(df)) {
df <- df[! (df$entry[i] ( if (df$entry[i] == 1 & df$entry[i-1] == 1 & df&id[i] == df&id[i-1] |
df$entry[i] == 1 & df$entry[i-1] == 0 & df&id[i] == df&id[i-1] ))),]
}
Here I generated a dummy that equals to 1 whenever GP > 0
and then I wanted to delete the observations according to the if condition in the loop. That is, delete rows in which a player
has entry = 1
more than once and rows after entry = 1
. However, I get the following error
Error: unexpected ')' in:
" df <- df[! (df$entry[i] ( if (df$entry[i] == 1 & df$entry[i-1] == 1 & df&id[i] == df&id[i-1] |
df$entry[i] == 1 & df$entry[i-1] == 0 & df&id_test[i] == df&id_test[i-1] ))"
Deleting the parenthesis only results in further errors. I would gladly appreciate any help or suggestions.
CodePudding user response:
df %>%
group_by(id) %>%
filter(row_number() <= which.max(GP > 0))
# A tibble: 6 x 2
# Groups: id [4]
id GP
<int> <int>
1 1 0
2 1 3
3 2 0
4 2 3
5 3 2
6 4 5
CodePudding user response:
We could use
library(dplyr)
df %>%
group_by(id) %>%
filter(cumsum(cumsum(GP > 0)) < 2) %>%
ungroup
-output
# A tibble: 6 × 2
id GP
<int> <int>
1 1 0
2 1 3
3 2 0
4 2 3
5 3 2
6 4 5
Or with slice
df %>%
group_by(id) %>%
slice(seq_len(which(GP > 0)[1])) %>%
ungroup
data
df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L),
GP = c(0L, 3L, 4L, 0L, 3L, 0L, 2L, 2L, 5L, 0L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
CodePudding user response:
For this dataset this should work. Please check it for your original dataset:
library(dplyr)
df %>%
group_by(id) %>%
mutate(entry = lead(GP)) %>%
na.omit %>%
ungroup() %>%
mutate(entry = ifelse(row_number()==1, 0, 1))
id GP entry
<int> <int> <dbl>
1 1 0 0
2 1 3 1
3 2 0 1
4 2 3 1
5 3 2 1
6 4 5 1