I have a dataframe which looks something like:
start_date end_date name value
1 2020-01-01 2020-01-06 x 2
2 2020-01-05 2020-01-07 y 4
I'd like to spread the dates by day, so that the name
variables become columns taking their respective value
, as so:
date x y
1 2020-01-01 2 NA
2 2020-01-02 2 NA
3 2020-01-03 2 NA
4 2020-01-04 2 NA
5 2020-01-05 2 4
6 2020-01-06 2 4
7 2020-01-07 NA 4
I would be interested in a base R
solution, but also solutions using dplyr
and/or lubridate
CodePudding user response:
Another possible solution, which combines purrr::pmap_dfr
and tidyr::pivot_wider
:
library(tidyverse)
library(lubridate)
pmap_dfr(df, ~ list(date=seq(ymd(..1), ymd(..2), 1), name=..3, value=..4)) %>%
pivot_wider(date)
#> # A tibble: 7 × 3
#> date x y
#> <date> <int> <int>
#> 1 2020-01-01 2 NA
#> 2 2020-01-02 2 NA
#> 3 2020-01-03 2 NA
#> 4 2020-01-04 2 NA
#> 5 2020-01-05 2 4
#> 6 2020-01-06 2 4
#> 7 2020-01-07 NA 4
CodePudding user response:
Convert the '_date' columns to Date
class (if it is not already in Date
class), then create the 'date' column by getting the seq
between the 'start_date', 'end_date' using map2
(or can also do after rowwise
), unnest
the list
column and reshape into 'wide' format with pivot_wider
library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
df1 %>%
mutate(across(ends_with('_date'), ymd)) %>%
transmute(date = map2(start_date, end_date, seq, by = "1 day"),
name, value) %>%
unnest(date) %>%
pivot_wider(names_from = name, values_from = value)
-output
# A tibble: 7 × 3
date x y
<date> <int> <int>
1 2020-01-01 2 NA
2 2020-01-02 2 NA
3 2020-01-03 2 NA
4 2020-01-04 2 NA
5 2020-01-05 2 4
6 2020-01-06 2 4
7 2020-01-07 NA 4
As the OP mentioned base R
methods, an option is to use Map
to create the sequence between the '_date' columns, create data.frame within list
and merge
them with Reduce
Reduce(function(...) merge(..., all = TRUE),
do.call(Map, c(f = function(u, v, nm, val) {
dat <- data.frame(date = seq(as.Date(u), as.Date(v),
by = "1 day"), val)
names(dat)[2] <- nm; dat
}, unname(df1))))
-output
date x y
1 2020-01-01 2 NA
2 2020-01-02 2 NA
3 2020-01-03 2 NA
4 2020-01-04 2 NA
5 2020-01-05 2 4
6 2020-01-06 2 4
7 2020-01-07 NA 4
data
df1 <- structure(list(start_date = c("2020-01-01", "2020-01-05"),
end_date = c("2020-01-06",
"2020-01-07"), name = c("x", "y"), value = c(2L, 4L)),
class = "data.frame", row.names = c("1",
"2"))
CodePudding user response:
You could use a series of tidyr
functions to achieve it.
library(tidyr)
df %>%
pivot_longer(ends_with("date"), names_to = NULL, values_to = "date",
values_transform = list(date = as.Date)) %>%
dplyr::group_by(name) %>%
complete(date = full_seq(date, 1)) %>%
fill(value) %>%
pivot_wider(names_from = name, values_from = value)
# # A tibble: 7 × 3
# date x y
# <date> <int> <int>
# 1 2020-01-01 2 NA
# 2 2020-01-02 2 NA
# 3 2020-01-03 2 NA
# 4 2020-01-04 2 NA
# 5 2020-01-05 2 4
# 6 2020-01-06 2 4
# 7 2020-01-07 NA 4