I have a dataset with outcome variable - owl weight[y] and time in years when the outcome was recorded[t]. There is a third variable what type of feed was provided to the baby owls[w] and 4th column what year the new feed was introduced[a].
y t w a
-1.354 1994 Zi 2015
2.146 1994 Zi 2015
3.146 2001 Zi 2015
2.646 2011 Zi 2015
-2.354 2015 Zi 2015
-0.854 2016 Zi 2015
-3.854 2017 Zi 2015
-2.354 2021 Zi 2015
-2.354 2001 Ca 2010
1.146 2002 Ca 2010
-2.354 2003 Ca 2010
-0.854 2005 Ca 2010
-6.354 2011 Ca 2010
2.146 2019 Ca 2010
Step1 : I like to create a time to event variable
column from the index a
column, like this, t-a
=time to event variable
y t w a TimeToEvent
-1.354 1994 Zi 2015 -21
2.146 1994 Zi 2015 -21
3.146 2001 Zi 2015 -14
2.646 2011 Zi 2015 -4
-2.354 2015 Zi 2015 0
-0.854 2016 Zi 2015 1
-3.854 2017 Zi 2015 2
-2.354 2021 Zi 2015 6
-2.354 2001 Ca 2010 -9
1.146 2002 Ca 2010 -8
-2.354 2003 Ca 2010 -7
-0.854 2005 Ca 2010 -5
1.147 2010 Ca 2010 0
-6.354 2011 Ca 2010 1
2.146 2019 Ca 2010 9
Step2 : I like to create 4 lag
and 4 lead
variables like this
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
2.146 1998 Zi 2015 -17 1 0 0 0 0 0 0 0 0
-3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
0.133 2013 Zi 2015 -2 0 0 1 0 0 0 0 0 0
-2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
-0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
-3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
-2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
-2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
-2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
-0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
1.147 2010 Ca 2010 0 0 0 0 0 1 0 0 0 0
-6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
As you can see, I am only interested in tracking the lead and lag for 4 time periods before the index and after the index year [column a].
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
0.133 2013 Zi 2015 -2 0 0 1 0 0 0 0 0 0
-2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
-0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
-3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
1.147 2010 Ca 2010 0 0 0 0 0 1 0 0 0 0
-6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
Any observations occurring outside the 4 lead or 4 lag periods are assigned 1 depending on whether they are t < a or t > a for example the first three observations time to event
is -12,-17 and -14 so these observations will be assigned 1 in Lg4 column (boundary) similarly first 4 observations of group(wa)=Ca is -9,-8,-7,-5 so they are assigned 1 in the in Lg4 column
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
2.146 1998 Zi 2015 -17 1 0 0 0 0 0 0 0 0
-3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
-2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
-2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
-0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
Observations with time to event
outside lead will be assigned 1 on the other end( column Ld4)
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
Final Expected dataset as shown in Step2 above
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
2.146 1998 Zi 2015 -17 1 0 0 0 0 0 0 0 0
-3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
0.133 2013 Zi 2015 -2 0 0 1 0 0 0 0 0 0
-2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
-0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
-3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
-2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
-2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
-2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
-0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
1.147 2010 Ca 2010 0 0 0 0 0 1 0 0 0 0
-6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
Any suggestions regarding this is much appreciated. Thanks.
CodePudding user response:
It seems that you are only testing upon your time to event so a bunch of ifelse
will do:
library(tidyverse)
df=read.table(header=T, text="y t w a
-1.354 1994 Zi 2015
2.146 1994 Zi 2015
3.146 2001 Zi 2015
2.646 2011 Zi 2015
-2.354 2015 Zi 2015
-0.854 2016 Zi 2015
-3.854 2017 Zi 2015
-2.354 2021 Zi 2015
-2.354 2001 Ca 2010
1.146 2002 Ca 2010
-2.354 2003 Ca 2010
-0.854 2005 Ca 2010
-6.354 2011 Ca 2010
2.146 2019 Ca 2010")
df %>%
mutate(
tte=t-a,
Lg4=ifelse(tte<=-4, 1, 0),
Lg3=ifelse(tte==-3, 1, 0),
Lg2=ifelse(tte==-2, 1, 0),
Lg1=ifelse(tte==-1, 1, 0),
Index=ifelse(tte==0, 1, 0),
Ld1=ifelse(tte==1, 1, 0),
Ld2=ifelse(tte==2, 1, 0),
Ld3=ifelse(tte==3, 1, 0),
Ld4=ifelse(tte>=4, 1, 0),
)
#> y t w a tte Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
#> 1 -1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
#> 2 2.146 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
#> 3 3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
#> 4 2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
#> 5 -2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
#> 6 -0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
#> 7 -3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
#> 8 -2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
#> 9 -2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
#> 10 1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
#> 11 -2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
#> 12 -0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
#> 13 -6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
#> 14 2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
Created on 2021-12-26 by the reprex package (v2.0.1)