I'm looking to do hazard analysis but before I do that I want to clean my dataset so I have only the data from right before a "death", if you will. I'm studying countries and since countries don't "die" per say I need to basically find the point where an event occurs, coded as a '1' in an indicator column, and then generate a column that has 0s everywhere except for every time except for n-periods before my indicator column hits '1'.
For example, if my data were the first row, I would be looking to find a way to generate the second row.
number_of_years = 5
year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
indicator = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
lag_column = c(0, 1, 1, 1, 1, 1, 0, 0, 0, 0) #I need to make this, the 5 years before the event occurs
Thank you!
CodePudding user response:
I'm sure there is a better way to do this. Having said that here is what worked for me.
-Sample data
df <- tibble(year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
indicator = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0))
- Note I added an extra 1 to the data to check for what happened with overlaps.
index = grep(1, df$indicator)
lag_index <- 0
for (ii in 1:length(index)){
lag_spots <- seq(from = index[ii]-4, to = index[ii])
lag_index <- append(lag_index, lag_spots)
}
lag_index <- unique(lag_index)
lag_column = rep(0, times = nrow(df))
df$lag_column <- lag_column
df$lag_column[lag_index] <- 1
Output
> df
# A tibble: 10 x 3
year indicator lag_column
<dbl> <dbl> <dbl>
1 1 0 0
2 2 0 1
3 3 0 1
4 4 0 1
5 5 0 1
6 6 1 1
7 7 0 1
8 8 0 1
9 9 1 1
10 10 0 0