I have data for a number of retail branches and the date at which they closed, if they didn't close then this is NA. I would like to expand these data so that they are a panel, with a 0/1 indicator to show if the branch was closed in that, or subsequent, years.
This is an example of the data format that I have and the data format that I would like. Here the data covers the period 2015 to 2019, 5 years. Branches A, B and D stay open, but branch C closes in 2016 and branch E in 2019.
branch <- LETTERS[1:5]
yearclosed <- c(NA, NA, 2016, NA, 2019)
have.df <- data.frame(branch, yearclosed)
have.df
branch <- c(rep("A",5), rep("B",5), rep("C",5), rep("D",5), rep("E",5))
year <- rep(2015:2019, 5)
closed <-c (rep(0,5), rep(0,5), 0,1,1,1,1, rep(0,5), 0,0,0,0,1)
want.df <- data.frame(branch, year, closed)
want.df
I have tried to experiment with converting from wide to long format but not made much progress. I could write a couple of for-loops, but these are generally not the best solution in R? Does anyone have any similar experiences they can share with me? Thanks.
CodePudding user response:
Code from question to generate have.df
branch <- LETTERS[1:5]
yearclosed <- c(NA, NA, 2016, NA, 2019)
have.df <- data.frame(branch, yearclosed)
Load libraries
library(dplyr)
library(tidyr)
Prepare have.df
by triming of missings, renaming the yearclosed
, and
creating a new closed
column that is 1
for every row.
have.df <-
have.df |>
filter(!is.na(yearclosed)) |>
rename(year = yearclosed) |>
mutate(closed = 1)
Using tidyr::expand_grid()
we can create the first two columns
of the data.frame you want and then join it with the modified have.df
to
obtain the result you are looking for. Using group_by()
and fill()
we can set all years after the initial closing year to 1. mutate()
and coalesce()
help us to set all missing values in closed
to 0
.
expand_grid(branch = LETTERS[1:5], year = 2015:2019) |>
left_join(have.df) |>
group_by(branch) |>
fill(closed) |>
mutate(closed = coalesce(closed, 0)) |>
print(n = 25)
#> Joining, by = c("branch", "year")
#> # A tibble: 25 × 3
#> branch year closed
#> <chr> <dbl> <dbl>
#> 1 A 2015 0
#> 2 A 2016 0
#> 3 A 2017 0
#> 4 A 2018 0
#> 5 A 2019 0
#> 6 B 2015 0
#> 7 B 2016 0
#> 8 B 2017 0
#> 9 B 2018 0
#> 10 B 2019 0
#> 11 C 2015 0
#> 12 C 2016 1
#> 13 C 2017 1
#> 14 C 2018 1
#> 15 C 2019 1
#> 16 D 2015 0
#> 17 D 2016 0
#> 18 D 2017 0
#> 19 D 2018 0
#> 20 D 2019 0
#> 21 E 2015 0
#> 22 E 2016 0
#> 23 E 2017 0
#> 24 E 2018 0
#> 25 E 2019 1