In a dataframe I only have the dates when a value is added. The dataframe should be incremental or cumulative. I have this sample dataframe:
library(tidyverse)
tibble::tribble(
~name, ~date, ~value,
"Corpus", "2019-03-01", 0.8,
"Corpus", "2019-08-01", 1.6,
"Cameron", "2019-12-01", 1.32,
"Cameron", "2020-08-01", 1.98
)
Then this is my desired outcome. How do I do this?
tibble::tribble(
~name, ~date, ~value,
"Corpus", "2019-03-01", 0.8,
"Corpus", "2019-04-01", 0.8,
"Corpus", "2019-05-01", 0.8,
"Corpus", "2019-06-01", 0.8,
"Corpus", "2019-07-01", 0.8,
"Corpus", "2019-08-01", 1.6,
"Corpus", "2019-09-01", 1.6,
"Corpus", "2019-10-01", 1.6,
"Corpus", "2019-11-01", 1.6,
"Corpus", "2019-12-01", 1.6,
"Corpus", "2020-01-01", 1.6,
"Corpus", "2020-02-01", 1.6,
"Corpus", "2020-03-01", 1.6,
"Corpus", "2020-04-01", 1.6,
"Corpus", "2020-05-01", 1.6,
"Corpus", "2020-06-01", 1.6,
"Corpus", "2020-07-01", 1.6,
"Corpus", "2020-08-01", 1.6,
"Cameron", "2019-03-01", 0,
"Cameron", "2019-04-01", 0,
"Cameron", "2019-05-01", 0,
"Cameron", "2019-06-01", 0,
"Cameron", "2019-07-01", 0,
"Cameron", "2019-08-01", 0,
"Cameron", "2019-09-01", 0,
"Cameron", "2019-10-01", 0,
"Cameron", "2019-11-01", 0,
"Cameron", "2019-12-01", 1.32,
"Cameron", "2020-01-01", 1.32,
"Cameron", "2020-02-01", 1.32,
"Cameron", "2020-03-01", 1.32,
"Cameron", "2020-04-01", 1.32,
"Cameron", "2020-05-01", 1.32,
"Cameron", "2020-06-01", 1.32,
"Cameron", "2020-07-01", 1.32,
"Cameron", "2020-08-01", 1.98
)
In this way I could for example create an area chart where Cameron starts later.
CodePudding user response:
You can use tidyr::complete()
, followed by a grouped cumsum()
:
library(tidyr)
library(dplyr)
library(lubridate)
tribble(
~name, ~date, ~value,
"Corpus", "2019-03-01", 0.8,
"Corpus", "2019-08-01", 1.6,
"Cameron", "2019-12-01", 1.32,
"Cameron", "2020-08-01", 1.98
) %>%
mutate(date = ymd(date)) %>%
complete(
name,
date = seq(min(date), max(date), by = "month"),
fill = list(value = 0)
) %>%
group_by(name) %>%
mutate(value = cumsum(value)) %>%
ungroup()
#> # A tibble: 36 × 3
#> name date value
#> <chr> <date> <dbl>
#> 1 Cameron 2019-03-01 0
#> 2 Cameron 2019-04-01 0
#> 3 Cameron 2019-05-01 0
#> 4 Cameron 2019-06-01 0
#> 5 Cameron 2019-07-01 0
#> 6 Cameron 2019-08-01 0
#> 7 Cameron 2019-09-01 0
#> 8 Cameron 2019-10-01 0
#> 9 Cameron 2019-11-01 0
#> 10 Cameron 2019-12-01 1.32
#> # … with 26 more rows
Created on 2022-10-24 with reprex v2.0.2