Home > Enterprise >  Convert holidays (holiday, from, to) to binary dummy variables (R)
Convert holidays (holiday, from, to) to binary dummy variables (R)

Time:05-23

I'm trying to convert a table of holiday data into a calendar table with dummy variables per holiday, but I can't get that done. Hope someone can help. I've been searching StackOverflow for some time, tried with tidyr. but still can's get it done.

The idea is simple, I want to convert the following data:

holidays <- data.frame(holiday = c("a", "b", "c"), 
                 datefrom = as.Date(c("1/1/2022", "1/2/2022", "1/3/2022"), format = "%d/%m/%y"),
                 dateto = as.Date(c("1/1/2022", "5/2/2022", "7/3/2022"), format = "%d/%m/%y"))

to this desired output:

calendar table

Hope someone can help!

CodePudding user response:

library(tidyverse)

days <- data.frame(date = as.Date(c("1/1/2022", "1/2/2022", "31/5/2022"), format = "%d/%m/%y"))
holidays <- data.frame(
  holiday = c("a", "b", "c"),
  datefrom = as.Date(c("1/1/2022", "1/2/2022", "1/3/2022"), format = "%d/%m/%y"),
  dateto = as.Date(c("1/1/2022", "5/2/2022", "7/3/2022"), format = "%d/%m/%y")
)

holidays %>%
  expand_grid(days) %>%
  filter(date <= dateto & date >= datefrom) %>%
  select(date, holiday) %>%
  mutate(value = 1) %>%
  complete(date = days$date, holiday = holidays$holiday, fill = list(value = 0)) %>%
  pivot_wider(names_from = holiday, values_from = value)
#> # A tibble: 3 × 4
#>   date           a     b     c
#>   <date>     <dbl> <dbl> <dbl>
#> 1 2020-01-01     1     0     0
#> 2 2020-02-01     0     1     0
#> 3 2020-05-31     0     0     0

Created on 2022-05-23 by the reprex package (v2.0.0)

CodePudding user response:

You can try:

. <- Map(seq, holidays$datefrom, holidays$dateto, 1)
. <- setNames(., holidays$holiday)
res <- sort(unique(do.call(c, .)))
data.frame(date=res,  sapply(., "%in%", x=res))
#         date a b c
#a  2020-01-01 1 0 0
#b1 2020-02-01 0 1 0
#b2 2020-02-02 0 1 0
#b3 2020-02-03 0 1 0
#b4 2020-02-04 0 1 0
#b5 2020-02-05 0 1 0
#c1 2020-03-01 0 0 1
#c2 2020-03-02 0 0 1
#c3 2020-03-03 0 0 1
#c4 2020-03-04 0 0 1
#c5 2020-03-05 0 0 1
#c6 2020-03-06 0 0 1
#c7 2020-03-07 0 0 1

CodePudding user response:

Here is one approach:

library(tidyverse)
holidays %>% 
  rowwise() %>% 
  mutate(date = list(seq.Date(datefrom,dateto,1))) %>% 
  unnest(date) %>% 
  pivot_wider(id_cols=date,names_from=holiday,values_from=holiday,values_fn = length,values_fill = 0)

Output:

   date           a     b     c
   <date>     <int> <int> <int>
 1 2020-01-01     1     0     0
 2 2020-02-01     0     1     0
 3 2020-02-02     0     1     0
 4 2020-02-03     0     1     0
 5 2020-02-04     0     1     0
 6 2020-02-05     0     1     0
 7 2020-03-01     0     0     1
 8 2020-03-02     0     0     1
 9 2020-03-03     0     0     1
10 2020-03-04     0     0     1
11 2020-03-05     0     0     1
12 2020-03-06     0     0     1
13 2020-03-07     0     0     1

And here is another that starts with apply(), and then binds the rows and updates the NA

bind_rows(apply(holidays,1,\(i) {
  setNames(data.frame(seq(as.Date(i[2]), as.Date(i[3]),1),1),c("date",i[1]))
})) %>% mutate(across(a:c, ~if_else(is.na(.x),0,1)))
  •  Tags:  
  • r
  • Related