I'm trying to convert a table of holiday data into a calendar table with dummy variables per holiday, but I can't get that done. Hope someone can help. I've been searching StackOverflow for some time, tried with tidyr. but still can's get it done.
The idea is simple, I want to convert the following data:
holidays <- data.frame(holiday = c("a", "b", "c"),
datefrom = as.Date(c("1/1/2022", "1/2/2022", "1/3/2022"), format = "%d/%m/%y"),
dateto = as.Date(c("1/1/2022", "5/2/2022", "7/3/2022"), format = "%d/%m/%y"))
to this desired output:
Hope someone can help!
CodePudding user response:
library(tidyverse)
days <- data.frame(date = as.Date(c("1/1/2022", "1/2/2022", "31/5/2022"), format = "%d/%m/%y"))
holidays <- data.frame(
holiday = c("a", "b", "c"),
datefrom = as.Date(c("1/1/2022", "1/2/2022", "1/3/2022"), format = "%d/%m/%y"),
dateto = as.Date(c("1/1/2022", "5/2/2022", "7/3/2022"), format = "%d/%m/%y")
)
holidays %>%
expand_grid(days) %>%
filter(date <= dateto & date >= datefrom) %>%
select(date, holiday) %>%
mutate(value = 1) %>%
complete(date = days$date, holiday = holidays$holiday, fill = list(value = 0)) %>%
pivot_wider(names_from = holiday, values_from = value)
#> # A tibble: 3 × 4
#> date a b c
#> <date> <dbl> <dbl> <dbl>
#> 1 2020-01-01 1 0 0
#> 2 2020-02-01 0 1 0
#> 3 2020-05-31 0 0 0
Created on 2022-05-23 by the reprex package (v2.0.0)
CodePudding user response:
You can try:
. <- Map(seq, holidays$datefrom, holidays$dateto, 1)
. <- setNames(., holidays$holiday)
res <- sort(unique(do.call(c, .)))
data.frame(date=res, sapply(., "%in%", x=res))
# date a b c
#a 2020-01-01 1 0 0
#b1 2020-02-01 0 1 0
#b2 2020-02-02 0 1 0
#b3 2020-02-03 0 1 0
#b4 2020-02-04 0 1 0
#b5 2020-02-05 0 1 0
#c1 2020-03-01 0 0 1
#c2 2020-03-02 0 0 1
#c3 2020-03-03 0 0 1
#c4 2020-03-04 0 0 1
#c5 2020-03-05 0 0 1
#c6 2020-03-06 0 0 1
#c7 2020-03-07 0 0 1
CodePudding user response:
Here is one approach:
library(tidyverse)
holidays %>%
rowwise() %>%
mutate(date = list(seq.Date(datefrom,dateto,1))) %>%
unnest(date) %>%
pivot_wider(id_cols=date,names_from=holiday,values_from=holiday,values_fn = length,values_fill = 0)
Output:
date a b c
<date> <int> <int> <int>
1 2020-01-01 1 0 0
2 2020-02-01 0 1 0
3 2020-02-02 0 1 0
4 2020-02-03 0 1 0
5 2020-02-04 0 1 0
6 2020-02-05 0 1 0
7 2020-03-01 0 0 1
8 2020-03-02 0 0 1
9 2020-03-03 0 0 1
10 2020-03-04 0 0 1
11 2020-03-05 0 0 1
12 2020-03-06 0 0 1
13 2020-03-07 0 0 1
And here is another that starts with apply()
, and then binds the rows and updates the NA
bind_rows(apply(holidays,1,\(i) {
setNames(data.frame(seq(as.Date(i[2]), as.Date(i[3]),1),1),c("date",i[1]))
})) %>% mutate(across(a:c, ~if_else(is.na(.x),0,1)))