I have to create a database with a single row for every day in the interval between the two dates (date_in - date_out). I have to use R.
How can I do this?
My data:
id date_in date_out days
1 1 13May2022 0:00:00 03Jul2022 0:00:00 51
2 3 10Nov2020 0:00:00 15Nov2020 0:00:00 5
3 4 25Feb2020 0:00:00 05Apr2020 0:00:00 40
CodePudding user response:
Here is an option. First, change dates into dates (yours might already be), then we map out all the dates from the start to the end, lastly we unnest.
library(tidyverse)
#data
df <- read.csv(textConnection("id, date_in, date_out, days,
1, 13May2022 0:00:00, 03Jul2022 0:00:00, 51,
3, 10Nov2020 0:00:00, 15Nov2020 0:00:00, 5,
4, 25Feb2020 0:00:00, 05Apr2020 0:00:00, 40")) |>
select(-X)
#solution
df|>
mutate(across(starts_with("date"), \(x) lubridate::dmy_hms(x) |>
lubridate::date()),
full_date = map2(date_in, date_out, \(x,y) seq(x, y, by = "1 day"))) |>
unnest_longer(full_date) |>
select(id, date = full_date)
#> # A tibble: 99 x 2
#> id date
#> <int> <date>
#> 1 1 2022-05-13
#> 2 1 2022-05-14
#> 3 1 2022-05-15
#> 4 1 2022-05-16
#> 5 1 2022-05-17
#> 6 1 2022-05-18
#> 7 1 2022-05-19
#> 8 1 2022-05-20
#> 9 1 2022-05-21
#> 10 1 2022-05-22
#> # ... with 89 more rows
CodePudding user response:
Here is a similar approach to AndS.'s, but using summarize
:
library(tidyverse)
library(lubridate)
# data
df <- read.csv(textConnection("id, date_in, date_out, days,
1, 13May2022 0:00:00, 03Jul2022 0:00:00, 51,
3, 10Nov2020 0:00:00, 15Nov2020 0:00:00, 5,
4, 25Feb2020 0:00:00, 05Apr2020 0:00:00, 40")) |>
select(-X)
# answer
df |>
mutate(across(c(date_in, date_out), ~date(dmy_hms(.x)))) |>
group_by(id) |>
summarize(date=seq(date_in, date_out, by="1 day"))