Home > Back-end >  How to create variables in a dataframe as there are days between two dates?
How to create variables in a dataframe as there are days between two dates?

Time:07-28

In a data.frame, I would like to create as many variables as there are days between two dates. Here's my example below. Many thanks in advance !

mydf <- data.frame(
  ident = c("a","b","c"),
  start = c("2023-01-01","2023-01-06","2023-01-24"),
  end = c("2023-01-03","2023-01-12","2023-01-30")
)

ident      start        end
1     a 2023-01-01 2023-01-03
2     b 2023-01-06 2023-01-12
3     c 2023-01-24 2023-01-30
# Expected output
mydf <- data.frame(
  ident = c("a","b","c"),
  start = c("2023-01-01","2023-01-03","2023-01-24"),
  end = c("2023-01-03","2023-01-12","2023-01-30"),
  `Sunday 1 January` = c(1,0,0),
  `Monday 2 January` = c(1,0,0),
  `Tuesday 3 January` = c(1,1,0),
  `Wednesday 4 January` = c(0,1,0)
)
  ident      start        end Sunday.1.January Monday.2.January Tuesday.3.January Wednesday.4.January
1     a 2023-01-01 2023-01-03                1                1                 1                   0
2     b 2023-01-03 2023-01-12                0                0                 1                   1
3     c 2023-01-24 2023-01-30                0                0                 0                   0

CodePudding user response:

Creating columns where the column name is effectively "data" is generally a bad idea for several reasons; it's generally better to work with the data in a long format and then pivot into something with "date-like column names" just for rendering. But if you really need it, then

library(dplyr)
library(tidyr) # unnest, pivot_wider
mydf %>%
  mutate(
    z = 1L,                              # uses as the "value" later
    across(c(start,end), as.Date),       # because we need seq.Date
    dates = Map(seq.Date, start, end, list(by = "day"))
  ) %>%
  unnest(dates) %>%
  pivot_wider(
    c(ident, start, end), names_from = "dates",
    values_from = "z", values_fill = 0L)
# # A tibble: 3 x 20
#   ident start      end        `2023-01-01` `2023-01-02` `2023-01-03` `2023-01-06` `2023-01-07` `2023-01-08` `2023-01-09` `2023-01-10` `2023-01-11` `2023-01-12` `2023-01-24` `2023-01-25` `2023-01-26` `2023-01-27` `2023-01-28`
#   <chr> <date>     <date>            <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>
# 1 a     2023-01-01 2023-01-03            1            1            1            0            0            0            0            0            0            0            0            0            0            0            0
# 2 b     2023-01-06 2023-01-12            0            0            0            1            1            1            1            1            1            1            0            0            0            0            0
# 3 c     2023-01-24 2023-01-30            0            0            0            0            0            0            0            0            0            0            1            1            1            1            1
# # ... with 2 more variables: 2023-01-29 <int>, 2023-01-30 <int>
  • Related