new rows for every day from in-between dates-CodePudding

I have to create a database with a single row for every day in the interval between the two dates (date_in - date_out). I have to use R.

How can I do this?

My data:

  id           date_in          date_out days
1  1 13May2022 0:00:00 03Jul2022 0:00:00   51
2  3 10Nov2020 0:00:00 15Nov2020 0:00:00    5
3  4 25Feb2020 0:00:00 05Apr2020 0:00:00   40

CodePudding user response：

Here is an option. First, change dates into dates (yours might already be), then we map out all the dates from the start to the end, lastly we unnest.

library(tidyverse)

#data
df <- read.csv(textConnection("id,           date_in,          date_out, days,
1, 13May2022 0:00:00, 03Jul2022 0:00:00,   51,
3, 10Nov2020 0:00:00, 15Nov2020 0:00:00,    5,
4, 25Feb2020 0:00:00, 05Apr2020 0:00:00,   40"))  |>
  select(-X) 

#solution
df|>
  mutate(across(starts_with("date"), \(x) lubridate::dmy_hms(x) |> 
                  lubridate::date()),
         full_date = map2(date_in, date_out, \(x,y) seq(x, y, by = "1 day"))) |>
  unnest_longer(full_date) |>
  select(id, date = full_date)
#> # A tibble: 99 x 2
#>       id date      
#>    <int> <date>    
#>  1     1 2022-05-13
#>  2     1 2022-05-14
#>  3     1 2022-05-15
#>  4     1 2022-05-16
#>  5     1 2022-05-17
#>  6     1 2022-05-18
#>  7     1 2022-05-19
#>  8     1 2022-05-20
#>  9     1 2022-05-21
#> 10     1 2022-05-22
#> # ... with 89 more rows

CodePudding user response：

Here is a similar approach to AndS.'s, but using summarize:

library(tidyverse)
library(lubridate)

# data
df <- read.csv(textConnection("id,           date_in,          date_out, days,
1, 13May2022 0:00:00, 03Jul2022 0:00:00,   51,
3, 10Nov2020 0:00:00, 15Nov2020 0:00:00,    5,
4, 25Feb2020 0:00:00, 05Apr2020 0:00:00,   40"))  |>
    select(-X) 

# answer
df |> 
    mutate(across(c(date_in, date_out), ~date(dmy_hms(.x)))) |> 
    group_by(id) |> 
    summarize(date=seq(date_in, date_out, by="1 day"))