Home > Enterprise >  r lag and lead indicator by group
r lag and lead indicator by group

Time:12-27

I have a dataset with outcome variable - owl weight[y] and time in years when the outcome was recorded[t]. There is a third variable what type of feed was provided to the baby owls[w] and 4th column what year the new feed was introduced[a].

  y       t      w       a
 -1.354   1994   Zi      2015
  2.146   1994   Zi      2015
  3.146   2001   Zi      2015
  2.646   2011   Zi      2015
 -2.354   2015   Zi      2015
 -0.854   2016   Zi      2015
 -3.854   2017   Zi      2015
 -2.354   2021   Zi      2015 
 -2.354   2001   Ca      2010 
  1.146   2002   Ca      2010 
 -2.354   2003   Ca      2010
 -0.854   2005   Ca      2010
 -6.354   2011   Ca      2010
  2.146   2019   Ca      2010

Step1 : I like to create a time to event variable column from the index a column, like this, t-a=time to event variable

  y       t      w       a     TimeToEvent
 -1.354   1994   Zi      2015  -21
  2.146   1994   Zi      2015  -21
  3.146   2001   Zi      2015  -14
  2.646   2011   Zi      2015  -4
 -2.354   2015   Zi      2015   0
 -0.854   2016   Zi      2015   1
 -3.854   2017   Zi      2015   2
 -2.354   2021   Zi      2015   6
 -2.354   2001   Ca      2010  -9
  1.146   2002   Ca      2010  -8
 -2.354   2003   Ca      2010  -7
 -0.854   2005   Ca      2010  -5
  1.147   2010   Ca      2010   0
 -6.354   2011   Ca      2010   1
  2.146   2019   Ca      2010   9

Step2 : I like to create 4 lag and 4 lead variables like this

  y       t      w       a     TimeToEvent  Lg4  Lg3  Lg2  Lg1  Index  Ld1  Ld2  Ld3  Ld4
 -1.354   1994   Zi      2015  -21          1    0    0    0    0      0    0    0    0  
  2.146   1998   Zi      2015  -17          1    0    0    0    0      0    0    0    0  
 -3.146   2001   Zi      2015  -14          1    0    0    0    0      0    0    0    0 
  2.646   2011   Zi      2015  -4           1    0    0    0    0      0    0    0    0 
  0.133   2013   Zi      2015  -2           0    0    1    0    0      0    0    0    0 
 -2.354   2015   Zi      2015   0           0    0    0    0    1      0    0    0    0 
 -0.854   2016   Zi      2015   1           0    0    0    0    0      1    0    0    0
 -3.854   2017   Zi      2015   2           0    0    0    0    0      0    1    0    0
 -2.354   2021   Zi      2015   6           0    0    0    0    0      0    0    0    1
 -2.354   2001   Ca      2010  -9           1    0    0    0    0      0    0    0    0 
  1.146   2002   Ca      2010  -8           1    0    0    0    0      0    0    0    0 
 -2.354   2003   Ca      2010  -7           1    0    0    0    0      0    0    0    0 
 -0.854   2005   Ca      2010  -5           1    0    0    0    0      0    0    0    0 
  1.147   2010   Ca      2010   0           0    0    0    0    1      0    0    0    0 
 -6.354   2011   Ca      2010   1           0    0    0    0    0      1    0    0    0
  2.146   2019   Ca      2010   9           0    0    0    0    0      0    0    0    1

As you can see, I am only interested in tracking the lead and lag for 4 time periods before the index and after the index year [column a].

  y       t      w       a     TimeToEvent  Lg4  Lg3  Lg2  Lg1  Index  Ld1  Ld2  Ld3  Ld4
  .       .      .       .      .           .    .    .    .    .      .    .    .    .  
  .       .      .       .      .           .    .    .    .    .      .    .    .    . 
  2.646   2011   Zi      2015  -4           1    0    0    0    0      0    0    0    0 
  0.133   2013   Zi      2015  -2           0    0    1    0    0      0    0    0    0 
 -2.354   2015   Zi      2015   0           0    0    0    0    1      0    0    0    0 
 -0.854   2016   Zi      2015   1           0    0    0    0    0      1    0    0    0
 -3.854   2017   Zi      2015   2           0    0    0    0    0      0    1    0    0
  .       .      .       .      .           .    .    .    .    .      .    .    .    .  
  .       .      .       .      .           .    .    .    .    .      .    .    .    . 
  1.147   2010   Ca      2010   0           0    0    0    0    1      0    0    0    0 
 -6.354   2011   Ca      2010   1           0    0    0    0    0      1    0    0    0

Any observations occurring outside the 4 lead or 4 lag periods are assigned 1 depending on whether they are t < a or t > a for example the first three observations time to event is -12,-17 and -14 so these observations will be assigned 1 in Lg4 column (boundary) similarly first 4 observations of group(wa)=Ca is -9,-8,-7,-5 so they are assigned 1 in the in Lg4 column

  y       t      w       a     TimeToEvent  Lg4  Lg3  Lg2  Lg1  Index  Ld1  Ld2  Ld3  Ld4
 -1.354   1994   Zi      2015  -21          1    0    0    0    0      0    0    0    0  
  2.146   1998   Zi      2015  -17          1    0    0    0    0      0    0    0    0  
 -3.146   2001   Zi      2015  -14          1    0    0    0    0      0    0    0    0 

 -2.354   2001   Ca      2010  -9           1    0    0    0    0      0    0    0    0 
  1.146   2002   Ca      2010  -8           1    0    0    0    0      0    0    0    0 
 -2.354   2003   Ca      2010  -7           1    0    0    0    0      0    0    0    0 
 -0.854   2005   Ca      2010  -5           1    0    0    0    0      0    0    0    0 

Observations with time to event outside lead will be assigned 1 on the other end( column Ld4)

  y       t      w       a     TimeToEvent  Lg4  Lg3  Lg2  Lg1  Index  Ld1  Ld2  Ld3  Ld4
 -2.354   2021   Zi      2015   6           0    0    0    0    0      0    0    0    1
  2.146   2019   Ca      2010   9           0    0    0    0    0      0    0    0    1

Final Expected dataset as shown in Step2 above

  y       t      w       a     TimeToEvent  Lg4  Lg3  Lg2  Lg1  Index  Ld1  Ld2  Ld3  Ld4
 -1.354   1994   Zi      2015  -21          1    0    0    0    0      0    0    0    0  
  2.146   1998   Zi      2015  -17          1    0    0    0    0      0    0    0    0  
 -3.146   2001   Zi      2015  -14          1    0    0    0    0      0    0    0    0 
 
  2.646   2011   Zi      2015  -4           1    0    0    0    0      0    0    0    0 
  0.133   2013   Zi      2015  -2           0    0    1    0    0      0    0    0    0 
 -2.354   2015   Zi      2015   0           0    0    0    0    1      0    0    0    0 
 -0.854   2016   Zi      2015   1           0    0    0    0    0      1    0    0    0
 -3.854   2017   Zi      2015   2           0    0    0    0    0      0    1    0    0
 
 -2.354   2021   Zi      2015   6           0    0    0    0    0      0    0    0    1
 
 -2.354   2001   Ca      2010  -9           1    0    0    0    0      0    0    0    0 
  1.146   2002   Ca      2010  -8           1    0    0    0    0      0    0    0    0 
 -2.354   2003   Ca      2010  -7           1    0    0    0    0      0    0    0    0 
 -0.854   2005   Ca      2010  -5           1    0    0    0    0      0    0    0    0 
 
  1.147   2010   Ca      2010   0           0    0    0    0    1      0    0    0    0 
 -6.354   2011   Ca      2010   1           0    0    0    0    0      1    0    0    0
 
  2.146   2019   Ca      2010   9           0    0    0    0    0      0    0    0    1

Any suggestions regarding this is much appreciated. Thanks.

CodePudding user response:

It seems that you are only testing upon your time to event so a bunch of ifelse will do:

library(tidyverse)
df=read.table(header=T, text="y       t      w       a
-1.354   1994   Zi      2015
2.146   1994   Zi      2015
3.146   2001   Zi      2015
2.646   2011   Zi      2015
-2.354   2015   Zi      2015
-0.854   2016   Zi      2015
-3.854   2017   Zi      2015
-2.354   2021   Zi      2015 
-2.354   2001   Ca      2010 
1.146   2002   Ca      2010 
-2.354   2003   Ca      2010
-0.854   2005   Ca      2010
-6.354   2011   Ca      2010
2.146   2019   Ca      2010")

df %>% 
    mutate(
        tte=t-a, 
        Lg4=ifelse(tte<=-4, 1, 0), 
        Lg3=ifelse(tte==-3, 1, 0), 
        Lg2=ifelse(tte==-2, 1, 0), 
        Lg1=ifelse(tte==-1, 1, 0), 
        Index=ifelse(tte==0, 1, 0), 
        Ld1=ifelse(tte==1, 1, 0),
        Ld2=ifelse(tte==2, 1, 0), 
        Ld3=ifelse(tte==3, 1, 0), 
        Ld4=ifelse(tte>=4, 1, 0), 
    )
#>         y    t  w    a tte Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
#> 1  -1.354 1994 Zi 2015 -21   1   0   0   0     0   0   0   0   0
#> 2   2.146 1994 Zi 2015 -21   1   0   0   0     0   0   0   0   0
#> 3   3.146 2001 Zi 2015 -14   1   0   0   0     0   0   0   0   0
#> 4   2.646 2011 Zi 2015  -4   1   0   0   0     0   0   0   0   0
#> 5  -2.354 2015 Zi 2015   0   0   0   0   0     1   0   0   0   0
#> 6  -0.854 2016 Zi 2015   1   0   0   0   0     0   1   0   0   0
#> 7  -3.854 2017 Zi 2015   2   0   0   0   0     0   0   1   0   0
#> 8  -2.354 2021 Zi 2015   6   0   0   0   0     0   0   0   0   1
#> 9  -2.354 2001 Ca 2010  -9   1   0   0   0     0   0   0   0   0
#> 10  1.146 2002 Ca 2010  -8   1   0   0   0     0   0   0   0   0
#> 11 -2.354 2003 Ca 2010  -7   1   0   0   0     0   0   0   0   0
#> 12 -0.854 2005 Ca 2010  -5   1   0   0   0     0   0   0   0   0
#> 13 -6.354 2011 Ca 2010   1   0   0   0   0     0   1   0   0   0
#> 14  2.146 2019 Ca 2010   9   0   0   0   0     0   0   0   0   1

Created on 2021-12-26 by the reprex package (v2.0.1)

  • Related