Home > Blockchain >  Create a dummy variable indicating if an event occured the past 2 years
Create a dummy variable indicating if an event occured the past 2 years

Time:04-05

I have a longitudinal dataset where I want to create a column that indicates if an event occurred for an individual in the past two years (t-2). I've created toy data that share the main characteristics with my data set (see code below).

set.seed(123)


df <- data.frame(id = sort(rep(1:10,5)),
                 time = rep(2011:2015, 10),
                 event = rbinom(50, 1, 0.2))
                 
head(df,10)   

# Output
   id time event
1   1 2011     0
2   1 2012     0
3   1 2013     0
4   1 2014     1
5   1 2015     1
6   2 2011     0
7   2 2012     0
8   2 2013     1
9   2 2014     0
10  2 2015     0

In this data, I want to create a new column occurred that indicates if the event occurred in the past two years. For the first 10 rows, this would lead to data looking like this:

   id time event occured
1   1 2011     0       0
2   1 2012     0       0
3   1 2013     0       0
4   1 2014     1       1
5   1 2015     1       1
6   2 2011     0       0
7   2 2012     0       0
8   2 2013     1       1
9   2 2014     0       1
10  2 2015     0       1

In the best world, I want the number of years going backward to be a parameter that can be changed (i.e., making it relatively easy to allow occurred to be constructed so it indicates if the event happened the past 1 year backward or the past 4 years as well).

Thank you!

CodePudding user response:

A data.table approach

assumes there are no missing years in your data.. So all years 2011:2015 are either 0 or 1.

library(data.table)
# make it a data.table
setDT(df)
# 
df[, occured := as.numeric(
  frollsum(event, n = 3, align = "right", fill = event[1]) > 0), 
  by = .(id)]

#    id time event occured
# 1:  1 2011     0       0
# 2:  1 2012     0       0
# 3:  1 2013     0       0
# 4:  1 2014     1       1
# 5:  1 2015     1       1
# 6:  2 2011     0       0
# 7:  2 2012     0       0
# 8:  2 2013     1       1
# 9:  2 2014     0       1
#10:  2 2015     0       1
#11:  3 2011     1       1
#12:  3 2012     0       1
#13:  3 2013     0       1
#14:  3 2014     0       0
#15:  3 2015     0       0
#16:  4 2011     1       1
#17:  4 2012     0       1
#18:  4 2013     0       1
#19:  4 2014     0       0
#20:  4 2015     1       1
#  ...

CodePudding user response:

Assuming you want to do this per group, you can use zoo::rollmean() together with ceiling():

library(dplyr)

# Will calculate for t - n periods, n is a parameter which is easy to change
n <- 2

df %>% 
  group_by(id) %>% 
  arrange(id, time) %>% 
  mutate(
    occurred = ceiling(zoo::rollmean(event, k = n, fill = event[1], align = "right"))
  )
#> # A tibble: 50 × 4
#> # Groups:   id [10]
#>       id  time event occurred
#>    <int> <int> <int>    <dbl>
#>  1     1  2011     0        0
#>  2     1  2012     0        0
#>  3     1  2013     0        0
#>  4     1  2014     1        1
#>  5     1  2015     1        1
#>  6     2  2011     0        0
#>  7     2  2012     0        0
#>  8     2  2013     1        1
#>  9     2  2014     0        1
#> 10     2  2015     0        0
#> # … with 40 more rows

Created on 2022-04-04 by the reprex package (v2.0.1)

CodePudding user response:

With zoo::rollapply and FUN = max:

library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(occured = rollapply(event, 3, max, align = "right", fill = event[1]))

# A tibble: 50 x 4
# Groups:   id [10]
      id  time event occured
   <int> <int> <int>   <int>
 1     1  2011     0       0
 2     1  2012     0       0
 3     1  2013     0       0
 4     1  2014     1       1
 5     1  2015     1       1
 6     2  2011     0       0
 7     2  2012     0       0
 8     2  2013     1       1
 9     2  2014     0       1
10     2  2015     0       1
11     3  2011     1       1
12     3  2012     0       1
13     3  2013     0       1
14     3  2014     0       0
15     3  2015     0       0
16     4  2011     1       1
17     4  2012     0       1
18     4  2013     0       1
19     4  2014     0       0
20     4  2015     1       1
21     5  2011     1       1
22     5  2012     0       1
23     5  2013     0       1
24     5  2014     1       1
25     5  2015     0       1
26     6  2011     0       0
27     6  2012     0       0
28     6  2013     0       0
29     6  2014     0       0
30     6  2015     0       0
# ... with 20 more rows
  • Related