Deleting rows based on previous cells in R


I am trying to reshape my data by generating a dummy variable and deleting some observations. It has the following form:

> df
   id GP
1   1  0
2   1  3
3   1  4
4   2  0
5   2  3
6   2  0
7   3  2
8   3  2
9   4  5
10  4  0

GP refers to games played. Now I want to generate a dummy variable lets call it entry that shows when a player first started playing and then I want to delete observations after the player has first entered. My final dataset should look like this:

    > df
       id GP entry
    1   1  0  0
    2   1  3  1
    3   2  0  1
    4   2  3  1
    5   3  2  1
    6   4  5  1

Hence, rows number 3, 6, 8 and 10 of the original dataset were deleted. I have tried generating a a dummy variable and then deleting rows:

df$entry <- ifelse(df$GP > 0, 1, 0)

for (i in 1:nrow(df)) {
  df <- df[! (df$entry[i] ( if (df$entry[i] == 1 & df$entry[i-1] == 1 & df&id[i] == df&id[i-1] |
                              df$entry[i] == 1 & df$entry[i-1] == 0 & df&id[i] == df&id[i-1] ))),]

Here I generated a dummy that equals to 1 whenever GP > 0 and then I wanted to delete the observations according to the if condition in the loop. That is, delete rows in which a player has entry = 1 more than once and rows after entry = 1. However, I get the following error

Error: unexpected ')' in:
"  df <- df[! (df$entry[i] ( if (df$entry[i] == 1 & df$entry[i-1] == 1 & df&id[i] == df&id[i-1] |
                              df$entry[i] == 1 & df$entry[i-1] == 0 & df&id_test[i] == df&id_test[i-1] ))"

Deleting the parenthesis only results in further errors. I would gladly appreciate any help or suggestions.

CodePudding user response:

df %>%
   group_by(id) %>%
   filter(row_number() <= which.max(GP > 0))

# A tibble: 6 x 2
# Groups:   id [4]
     id    GP
  <int> <int>
1     1     0
2     1     3
3     2     0
4     2     3
5     3     2
6     4     5

CodePudding user response:

We could use

df %>% 
  group_by(id) %>% 
  filter(cumsum(cumsum(GP > 0)) < 2) %>%


# A tibble: 6 × 2
     id    GP
  <int> <int>
1     1     0
2     1     3
3     2     0
4     2     3
5     3     2
6     4     5

Or with slice

df %>%
   group_by(id) %>%
   slice(seq_len(which(GP > 0)[1])) %>%


df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L), 
    GP = c(0L, 3L, 4L, 0L, 3L, 0L, 2L, 2L, 5L, 0L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

CodePudding user response:

For this dataset this should work. Please check it for your original dataset:

df %>% 
  group_by(id) %>% 
  mutate(entry = lead(GP)) %>% 
  na.omit %>% 
  ungroup() %>% 
  mutate(entry = ifelse(row_number()==1, 0, 1))
     id    GP entry
  <int> <int> <dbl>
1     1     0     0
2     1     3     1
3     2     0     1
4     2     3     1
5     3     2     1
6     4     5     1
