Home > Net >  How to convert 'event' data to panel data?
How to convert 'event' data to panel data?

Time:11-05

I have data for a number of retail branches and the date at which they closed, if they didn't close then this is NA. I would like to expand these data so that they are a panel, with a 0/1 indicator to show if the branch was closed in that, or subsequent, years.

This is an example of the data format that I have and the data format that I would like. Here the data covers the period 2015 to 2019, 5 years. Branches A, B and D stay open, but branch C closes in 2016 and branch E in 2019.

branch <- LETTERS[1:5]
yearclosed <- c(NA, NA, 2016, NA, 2019)

have.df <- data.frame(branch, yearclosed)
have.df

branch <- c(rep("A",5), rep("B",5), rep("C",5), rep("D",5), rep("E",5))
year <- rep(2015:2019, 5)
closed <-c (rep(0,5), rep(0,5), 0,1,1,1,1, rep(0,5), 0,0,0,0,1)

want.df <- data.frame(branch, year, closed)
want.df

I have tried to experiment with converting from wide to long format but not made much progress. I could write a couple of for-loops, but these are generally not the best solution in R? Does anyone have any similar experiences they can share with me? Thanks.

CodePudding user response:

Code from question to generate have.df

branch <- LETTERS[1:5]
yearclosed <- c(NA, NA, 2016, NA, 2019)
have.df <- data.frame(branch, yearclosed)

Load libraries

library(dplyr)
library(tidyr)

Prepare have.df by triming of missings, renaming the yearclosed, and creating a new closed column that is 1 for every row.

have.df <- 
  have.df |> 
  filter(!is.na(yearclosed)) |> 
  rename(year = yearclosed) |> 
  mutate(closed = 1)

Using tidyr::expand_grid() we can create the first two columns of the data.frame you want and then join it with the modified have.df to obtain the result you are looking for. Using group_by() and fill() we can set all years after the initial closing year to 1. mutate() and coalesce() help us to set all missing values in closed to 0.

expand_grid(branch = LETTERS[1:5], year = 2015:2019) |> 
  left_join(have.df) |> 
  group_by(branch) |> 
  fill(closed) |> 
  mutate(closed = coalesce(closed, 0)) |> 
  print(n = 25)
#> Joining, by = c("branch", "year")
#> # A tibble: 25 × 3
#>    branch  year closed
#>    <chr>  <dbl>  <dbl>
#>  1 A       2015      0
#>  2 A       2016      0
#>  3 A       2017      0
#>  4 A       2018      0
#>  5 A       2019      0
#>  6 B       2015      0
#>  7 B       2016      0
#>  8 B       2017      0
#>  9 B       2018      0
#> 10 B       2019      0
#> 11 C       2015      0
#> 12 C       2016      1
#> 13 C       2017      1
#> 14 C       2018      1
#> 15 C       2019      1
#> 16 D       2015      0
#> 17 D       2016      0
#> 18 D       2017      0
#> 19 D       2018      0
#> 20 D       2019      0
#> 21 E       2015      0
#> 22 E       2016      0
#> 23 E       2017      0
#> 24 E       2018      0
#> 25 E       2019      1
  • Related