In the following simplified example, you can see an id column (e.g. Product ID), a Date column containing actual and future dates, a value column and a column, indicating if it is an actual value, or a forecasted value using ML.
My goal is to replace the NA's for each Model, with the last value and date of the ACT column of each id. In my example this would mean for id A1, to replace the NA's in ML1 and ML2 with 2014-01-01 as Date and 54 as Value.
library(tidyverse)
df <- tibble(id = c(rep("A1",11), rep("B1",11)),
Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
"2014-01-01",NA, "2015-01-01","2016-01-01",
NA, "2015-01-01","2016-01-01"),2)),
Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
c(54,41,32,65,76,NA,32,42,NA,23,76))
)
I am searching for a pipeable solution like dplyr with no for-loops.
CodePudding user response:
How about this:
library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
df <- tibble(id = c(rep("A1",11), rep("B1",11)),
Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
"2014-01-01",NA, "2015-01-01","2016-01-01",
NA, "2015-01-01","2016-01-01"),2)),
Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
c(54,41,32,65,76,NA,32,42,NA,23,76))
)
df %>%
group_by(id) %>%
filter(Model == "ACT") %>%
summarise(across(c(Date, Value), last)) %>%
rename(date_fill = Date, value_fill = Value) %>%
right_join(df) %>%
mutate(Value = case_when(Model != "Act" & is.na(Value) ~ value_fill, TRUE ~ Value),
Date = case_when(Model != "Act" & is.na(Date) ~ date_fill, TRUE ~ Date)) %>%
select(-c("date_fill", "value_fill"))
#> Joining, by = "id"
#> # A tibble: 22 × 4
#> id Model Date Value
#> <chr> <chr> <date> <dbl>
#> 1 A1 ACT 2010-01-01 11
#> 2 A1 ACT 2011-01-01 31
#> 3 A1 ACT 2012-01-01 44
#> 4 A1 ACT 2013-01-01 21
#> 5 A1 ACT 2014-01-01 54
#> 6 A1 ML1 2014-01-01 54
#> 7 A1 ML1 2015-01-01 53
#> 8 A1 ML1 2016-01-01 13
#> 9 A1 ML2 2014-01-01 54
#> 10 A1 ML2 2015-01-01 33
#> # … with 12 more rows
Created on 2022-03-28 by the reprex package (v2.0.1)