Home > Blockchain >  Add date rows between min and max date in dataframe
Add date rows between min and max date in dataframe

Time:10-25

In a dataframe I only have the dates when a value is added. The dataframe should be incremental or cumulative. I have this sample dataframe:

library(tidyverse)

tibble::tribble(
      ~name,        ~date, ~value,
   "Corpus", "2019-03-01",    0.8,
   "Corpus", "2019-08-01",    1.6,
  "Cameron", "2019-12-01",   1.32,
  "Cameron", "2020-08-01",   1.98
  )

Then this is my desired outcome. How do I do this?

tibble::tribble(
      ~name,        ~date, ~value,
   "Corpus", "2019-03-01",  0.8,
   "Corpus", "2019-04-01",  0.8,
   "Corpus", "2019-05-01",  0.8,
   "Corpus", "2019-06-01",  0.8,
   "Corpus", "2019-07-01",  0.8,
   "Corpus", "2019-08-01",  1.6,
   "Corpus", "2019-09-01",  1.6,
   "Corpus", "2019-10-01",  1.6,
   "Corpus", "2019-11-01",  1.6,
   "Corpus", "2019-12-01",  1.6,
   "Corpus", "2020-01-01",  1.6,
   "Corpus", "2020-02-01",  1.6,
   "Corpus", "2020-03-01",  1.6,
   "Corpus", "2020-04-01",  1.6,
   "Corpus", "2020-05-01",  1.6,
   "Corpus", "2020-06-01",  1.6,
   "Corpus", "2020-07-01",  1.6,
   "Corpus", "2020-08-01",  1.6,
  "Cameron", "2019-03-01",   0,
  "Cameron", "2019-04-01",   0,
  "Cameron", "2019-05-01",   0,
  "Cameron", "2019-06-01",   0,
  "Cameron", "2019-07-01",   0,
  "Cameron", "2019-08-01",   0,
  "Cameron", "2019-09-01",   0,
  "Cameron", "2019-10-01",   0,
  "Cameron", "2019-11-01",   0,
  "Cameron", "2019-12-01", 1.32,
  "Cameron", "2020-01-01", 1.32,
  "Cameron", "2020-02-01", 1.32,
  "Cameron", "2020-03-01", 1.32,
  "Cameron", "2020-04-01", 1.32,
  "Cameron", "2020-05-01", 1.32,
  "Cameron", "2020-06-01", 1.32,
  "Cameron", "2020-07-01", 1.32,
  "Cameron", "2020-08-01", 1.98
  )

In this way I could for example create an area chart where Cameron starts later.

enter image description here

CodePudding user response:

You can use tidyr::complete(), followed by a grouped cumsum():

library(tidyr)
library(dplyr)
library(lubridate)

tribble(
    ~name,        ~date, ~value,
    "Corpus", "2019-03-01",    0.8,
    "Corpus", "2019-08-01",    1.6,
    "Cameron", "2019-12-01",   1.32,
    "Cameron", "2020-08-01",   1.98
  ) %>% 
  mutate(date = ymd(date)) %>% 
  complete(
    name, 
    date = seq(min(date), max(date), by = "month"), 
    fill = list(value = 0)
  ) %>% 
  group_by(name) %>% 
  mutate(value = cumsum(value)) %>% 
  ungroup()

#> # A tibble: 36 × 3
#>    name    date       value
#>    <chr>   <date>     <dbl>
#>  1 Cameron 2019-03-01  0   
#>  2 Cameron 2019-04-01  0   
#>  3 Cameron 2019-05-01  0   
#>  4 Cameron 2019-06-01  0   
#>  5 Cameron 2019-07-01  0   
#>  6 Cameron 2019-08-01  0   
#>  7 Cameron 2019-09-01  0   
#>  8 Cameron 2019-10-01  0   
#>  9 Cameron 2019-11-01  0   
#> 10 Cameron 2019-12-01  1.32
#> # … with 26 more rows

Created on 2022-10-24 with reprex v2.0.2

  • Related