Home > Blockchain >  Spread values based on start-date/end-date
Spread values based on start-date/end-date

Time:06-24

I have a dataframe which looks something like:

  start_date   end_date name value
1 2020-01-01 2020-01-06    x     2
2 2020-01-05 2020-01-07    y     4

I'd like to spread the dates by day, so that the name variables become columns taking their respective value, as so:

        date  x  y
1 2020-01-01  2 NA
2 2020-01-02  2 NA
3 2020-01-03  2 NA
4 2020-01-04  2 NA
5 2020-01-05  2  4
6 2020-01-06  2  4
7 2020-01-07 NA  4

I would be interested in a base R solution, but also solutions using dplyr and/or lubridate

CodePudding user response:

Another possible solution, which combines purrr::pmap_dfr and tidyr::pivot_wider:

library(tidyverse)
library(lubridate)

pmap_dfr(df, ~ list(date=seq(ymd(..1), ymd(..2), 1), name=..3, value=..4)) %>% 
  pivot_wider(date)

#> # A tibble: 7 × 3
#>   date           x     y
#>   <date>     <int> <int>
#> 1 2020-01-01     2    NA
#> 2 2020-01-02     2    NA
#> 3 2020-01-03     2    NA
#> 4 2020-01-04     2    NA
#> 5 2020-01-05     2     4
#> 6 2020-01-06     2     4
#> 7 2020-01-07    NA     4

CodePudding user response:

Convert the '_date' columns to Date class (if it is not already in Date class), then create the 'date' column by getting the seq between the 'start_date', 'end_date' using map2 (or can also do after rowwise), unnest the list column and reshape into 'wide' format with pivot_wider

library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
df1 %>% 
  mutate(across(ends_with('_date'), ymd)) %>% 
  transmute(date = map2(start_date, end_date, seq, by = "1 day"), 
       name, value) %>% 
  unnest(date) %>%
  pivot_wider(names_from = name, values_from = value)

-output

# A tibble: 7 × 3
  date           x     y
  <date>     <int> <int>
1 2020-01-01     2    NA
2 2020-01-02     2    NA
3 2020-01-03     2    NA
4 2020-01-04     2    NA
5 2020-01-05     2     4
6 2020-01-06     2     4
7 2020-01-07    NA     4

As the OP mentioned base R methods, an option is to use Map to create the sequence between the '_date' columns, create data.frame within list and merge them with Reduce

Reduce(function(...) merge(..., all = TRUE),
   do.call(Map, c(f = function(u, v, nm, val) {
    dat <- data.frame(date = seq(as.Date(u), as.Date(v), 
   by = "1 day"),  val)
  names(dat)[2] <- nm; dat
   }, unname(df1))))

-output

        date  x  y
1 2020-01-01  2 NA
2 2020-01-02  2 NA
3 2020-01-03  2 NA
4 2020-01-04  2 NA
5 2020-01-05  2  4
6 2020-01-06  2  4
7 2020-01-07 NA  4

data

df1 <- structure(list(start_date = c("2020-01-01", "2020-01-05"), 
end_date = c("2020-01-06", 
"2020-01-07"), name = c("x", "y"), value = c(2L, 4L)), 
class = "data.frame", row.names = c("1", 
"2"))

CodePudding user response:

You could use a series of tidyr functions to achieve it.

library(tidyr)

df %>%
  pivot_longer(ends_with("date"), names_to = NULL, values_to = "date",
               values_transform = list(date = as.Date)) %>%
  dplyr::group_by(name) %>%
  complete(date = full_seq(date, 1)) %>%
  fill(value) %>% 
  pivot_wider(names_from = name, values_from = value)

# # A tibble: 7 × 3
#   date           x     y
#   <date>     <int> <int>
# 1 2020-01-01     2    NA
# 2 2020-01-02     2    NA
# 3 2020-01-03     2    NA
# 4 2020-01-04     2    NA
# 5 2020-01-05     2     4
# 6 2020-01-06     2     4
# 7 2020-01-07    NA     4
  • Related