I thought this would be trivial, and I think it must be, but I am very tired and stuck at this problem at the moment.
Consider a df with two columns, one with a year, and the other with a binary variable indicating some event.
df <- data.frame(year = c(2000,2001,2002,2003,2004, 2005,2006,2007,2008,2010),
flag = c(0,0,0,1,0,0,0,1,0,0))
I want to create a third column that simply counts the years since the last flag and that resets when a new flag appears, like so:
I thought this code would do the job:
First, add a 0 as "year_since" for every year with a flag, then, if there was a flag in the previous year, add 1 to the value of the previous "year_since".
df <- df %>% mutate(year_since = ifelse(flag == 1, 0, NA)) %>%
mutate(year_since = ifelse(dplyr::lag(flag, n=1, order_by = "year") == 1 & is.na(year_since),
dplyr::lag(year_since, n=1, order_by = "year") 1, year_since))
However, this returns NA for every row that should be 1,2,3, and so on.
CodePudding user response:
You could do
df %>%
group_by(group = cumsum(flag)) %>%
mutate(year_since = ifelse(group == 0, NA, seq(n()) - 1)) %>%
ungroup() %>%
select(-group)
#> # A tibble: 10 x 3
#> year flag year_since
#> <dbl> <dbl> <dbl>
#> 1 2000 0 NA
#> 2 2001 0 NA
#> 3 2002 0 NA
#> 4 2003 1 0
#> 5 2004 0 1
#> 6 2005 0 2
#> 7 2006 0 3
#> 8 2007 1 0
#> 9 2008 0 1
#> 10 2010 0 2
Created on 2022-09-16 with reprex v2.0.2
CodePudding user response:
Using data.table
library(data.table)
setDT(df)[, year_since := (NA^!cummax(flag)) * rowid(cumsum(flag))-1]
-output
> df
year flag year_since
<num> <num> <num>
1: 2000 0 NA
2: 2001 0 NA
3: 2002 0 NA
4: 2003 1 0
5: 2004 0 1
6: 2005 0 2
7: 2006 0 3
8: 2007 1 0
9: 2008 0 1
10: 2010 0 2