Home > Net >  Fill in Column Based on Other Rows (R)
Fill in Column Based on Other Rows (R)

Time:12-02

I am looking for a way to fill in a column in R based on values in a different column. Below is what my data looks like.

year action player end
2001 1 Mike 2003
2002 0 Mike NA
2003 0 Mike NA
2004 0 Mike NA
2001 0 Alan NA
2002 0 Alan NA
2003 1 Alan 2004
2004 0 Alan NA

I would like to either change the "action" column or create a new column such that it reflects the duration between the "year" and "end" variables. Below is what it would look like:

year action player end
2001 1 Mike 2003
2002 1 Mike NA
2003 1 Mike NA
2004 0 Mike NA
2001 0 Alan NA
2002 0 Alan NA
2003 1 Alan 2004
2004 1 Alan NA

I have tried to do this with the following loop:

i <- 0 
z <- 0 
for (i in 1:nrow(df)){
  i <- z   i   1
  if (df[i, 2] == 0) {}
  else {df[i, 5] = (df[i, 4] -  df[i, 1])}
z <- df[i,5]
for (z in i:nrow(df)){df[i, 2] = 1}
  }

Here, my i value is skyrocketing, breaking the loop. I am not sure why that is occuring. I'd be interested to either know how to fix my approach or how to do this in a smarter fashion.

CodePudding user response:

There's no need for explicit loops here.

First group your data frame by player. Then find the rows where the cumulative sum (cumsum) of action is greater than 0 and the year is less than or equal to the end year of the group. If the row meets these conditions, set action to 1, otherwise to 0.

Using the dplyr package you could achieve this in a couple of lines:

library(dplyr)

df %>%
  group_by(player) %>%
  mutate(action = as.numeric(cumsum(action) > 0 & year <= na.omit(end)[1]))
#> # A tibble: 8 x 4
#> # Groups:   player [2]
#>    year action player   end
#>   <int>  <dbl> <chr>  <int>
#> 1  2001      1 Mike    2003
#> 2  2002      1 Mike      NA
#> 3  2003      1 Mike      NA
#> 4  2004      0 Mike      NA
#> 5  2001      0 Alan      NA
#> 6  2002      0 Alan      NA
#> 7  2003      1 Alan    2004
#> 8  2004      1 Alan      NA
  • Related