How to convert 'event' data to panel data?-CodePudding

I have data for a number of retail branches and the date at which they closed, if they didn't close then this is NA. I would like to expand these data so that they are a panel, with a 0/1 indicator to show if the branch was closed in that, or subsequent, years.

This is an example of the data format that I have and the data format that I would like. Here the data covers the period 2015 to 2019, 5 years. Branches A, B and D stay open, but branch C closes in 2016 and branch E in 2019.

branch <- LETTERS[1:5]
yearclosed <- c(NA, NA, 2016, NA, 2019)

have.df <- data.frame(branch, yearclosed)
have.df

branch <- c(rep("A",5), rep("B",5), rep("C",5), rep("D",5), rep("E",5))
year <- rep(2015:2019, 5)
closed <-c (rep(0,5), rep(0,5), 0,1,1,1,1, rep(0,5), 0,0,0,0,1)

want.df <- data.frame(branch, year, closed)
want.df

I have tried to experiment with converting from wide to long format but not made much progress. I could write a couple of for-loops, but these are generally not the best solution in R? Does anyone have any similar experiences they can share with me? Thanks.

CodePudding user response：

Code from question to generate have.df

branch <- LETTERS[1:5]
yearclosed <- c(NA, NA, 2016, NA, 2019)
have.df <- data.frame(branch, yearclosed)

Load libraries

library(dplyr)
library(tidyr)

Prepare have.df by triming of missings, renaming the yearclosed, and creating a new closed column that is 1 for every row.

have.df <- 
  have.df |> 
  filter(!is.na(yearclosed)) |> 
  rename(year = yearclosed) |> 
  mutate(closed = 1)

Using tidyr::expand_grid() we can create the first two columns of the data.frame you want and then join it with the modified have.df to obtain the result you are looking for. Using group_by() and fill() we can set all years after the initial closing year to 1. mutate() and coalesce() help us to set all missing values in closed to 0.

expand_grid(branch = LETTERS[1:5], year = 2015:2019) |> 
  left_join(have.df) |> 
  group_by(branch) |> 
  fill(closed) |> 
  mutate(closed = coalesce(closed, 0)) |> 
  print(n = 25)
#> Joining, by = c("branch", "year")
#> # A tibble: 25 × 3
#>    branch  year closed
#>    <chr>  <dbl>  <dbl>
#>  1 A       2015      0
#>  2 A       2016      0
#>  3 A       2017      0
#>  4 A       2018      0
#>  5 A       2019      0
#>  6 B       2015      0
#>  7 B       2016      0
#>  8 B       2017      0
#>  9 B       2018      0
#> 10 B       2019      0
#> 11 C       2015      0
#> 12 C       2016      1
#> 13 C       2017      1
#> 14 C       2018      1
#> 15 C       2019      1
#> 16 D       2015      0
#> 17 D       2016      0
#> 18 D       2017      0
#> 19 D       2018      0
#> 20 D       2019      0
#> 21 E       2015      0
#> 22 E       2016      0
#> 23 E       2017      0
#> 24 E       2018      0
#> 25 E       2019      1