I need to create a variable (V3
) based on two pre-existing variables (V1
and V2
). V1
is the year and V2
is a dummy variable. I want to create V3
which counts the number of years since the dummy variable (V2
) is 1 for the first time in the dataset. See the required output of V3
below. Notice that when V1
skips a year from 2005 to 2007, the increment in V3
recognises that.
V1 | V2 | V3 |
---|---|---|
2001 | 0 | 0 |
2002 | 0 | 0 |
2003 | 1 | 1 |
2004 | 1 | 2 |
2005 | 1 | 3 |
2007 | 1 | 5 |
Here's the data:
df<-data.frame(V1=c(2001, 2002, 2003, 2004, 2005, 2007),
V2=c(0, 0, 1, 1, 1, 1))
My failed attempt using dplyr:
df2 <- df %>%
mutate(V3 = case_when(V2 == 1 ~ V1 - min(V1)))
My attempt uses min(V1)
to capture 2001 instead of 2003.
Thanks for your help.
CodePudding user response:
Using match
and pmax
-
library(dplyr)
df %>% mutate(V3 = pmax(V1 - V1[match(1, V2)] 1, 0))
# V1 V2 V3
#1 2001 0 0
#2 2002 0 0
#3 2003 1 1
#4 2004 1 2
#5 2005 1 3
#6 2007 1 5
V1[match(1, V2)]
returns the V1
value where V2
was 1 for the first time. We subtract that value from each V1
. pmax
is used to change the negative values to 0.