I am looking for a way to fill in a column in R based on values in a different column. Below is what my data looks like.
year | action | player | end |
---|---|---|---|
2001 | 1 | Mike | 2003 |
2002 | 0 | Mike | NA |
2003 | 0 | Mike | NA |
2004 | 0 | Mike | NA |
2001 | 0 | Alan | NA |
2002 | 0 | Alan | NA |
2003 | 1 | Alan | 2004 |
2004 | 0 | Alan | NA |
I would like to either change the "action" column or create a new column such that it reflects the duration between the "year" and "end" variables. Below is what it would look like:
year | action | player | end |
---|---|---|---|
2001 | 1 | Mike | 2003 |
2002 | 1 | Mike | NA |
2003 | 1 | Mike | NA |
2004 | 0 | Mike | NA |
2001 | 0 | Alan | NA |
2002 | 0 | Alan | NA |
2003 | 1 | Alan | 2004 |
2004 | 1 | Alan | NA |
I have tried to do this with the following loop:
i <- 0
z <- 0
for (i in 1:nrow(df)){
i <- z i 1
if (df[i, 2] == 0) {}
else {df[i, 5] = (df[i, 4] - df[i, 1])}
z <- df[i,5]
for (z in i:nrow(df)){df[i, 2] = 1}
}
Here, my i value is skyrocketing, breaking the loop. I am not sure why that is occuring. I'd be interested to either know how to fix my approach or how to do this in a smarter fashion.
CodePudding user response:
There's no need for explicit loops here.
First group your data frame by player
. Then find the rows where the cumulative sum (cumsum
) of action
is greater than 0 and the year
is less than or equal to the end
year of the group. If the row meets these conditions, set action
to 1, otherwise to 0.
Using the dplyr package you could achieve this in a couple of lines:
library(dplyr)
df %>%
group_by(player) %>%
mutate(action = as.numeric(cumsum(action) > 0 & year <= na.omit(end)[1]))
#> # A tibble: 8 x 4
#> # Groups: player [2]
#> year action player end
#> <int> <dbl> <chr> <int>
#> 1 2001 1 Mike 2003
#> 2 2002 1 Mike NA
#> 3 2003 1 Mike NA
#> 4 2004 0 Mike NA
#> 5 2001 0 Alan NA
#> 6 2002 0 Alan NA
#> 7 2003 1 Alan 2004
#> 8 2004 1 Alan NA