Home > database >  How to replace values by multiple grouping conditions?
How to replace values by multiple grouping conditions?

Time:03-29

In the following simplified example, you can see an id column (e.g. Product ID), a Date column containing actual and future dates, a value column and a column, indicating if it is an actual value, or a forecasted value using ML.

My goal is to replace the NA's for each Model, with the last value and date of the ACT column of each id. In my example this would mean for id A1, to replace the NA's in ML1 and ML2 with 2014-01-01 as Date and 54 as Value.

library(tidyverse)

df <- tibble(id = c(rep("A1",11), rep("B1",11)),
             Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
             Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
                             "2014-01-01",NA, "2015-01-01","2016-01-01",
                             NA, "2015-01-01","2016-01-01"),2)),
             Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
                     c(54,41,32,65,76,NA,32,42,NA,23,76))
             )

I am searching for a pipeable solution like dplyr with no for-loops.

CodePudding user response:

How about this:

library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2

df <- tibble(id = c(rep("A1",11), rep("B1",11)),
             Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
             Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
                                  "2014-01-01",NA, "2015-01-01","2016-01-01",
                                  NA, "2015-01-01","2016-01-01"),2)),
             Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
                       c(54,41,32,65,76,NA,32,42,NA,23,76))
)
df %>% 
group_by(id) %>% 
filter(Model == "ACT") %>% 
summarise(across(c(Date, Value), last)) %>% 
rename(date_fill = Date, value_fill = Value) %>% 
right_join(df) %>% 
mutate(Value = case_when(Model != "Act" & is.na(Value) ~ value_fill, TRUE  ~ Value), 
       Date = case_when(Model != "Act" & is.na(Date) ~ date_fill, TRUE  ~ Date)) %>% 
select(-c("date_fill", "value_fill"))
#> Joining, by = "id"
#> # A tibble: 22 × 4
#>    id    Model Date       Value
#>    <chr> <chr> <date>     <dbl>
#>  1 A1    ACT   2010-01-01    11
#>  2 A1    ACT   2011-01-01    31
#>  3 A1    ACT   2012-01-01    44
#>  4 A1    ACT   2013-01-01    21
#>  5 A1    ACT   2014-01-01    54
#>  6 A1    ML1   2014-01-01    54
#>  7 A1    ML1   2015-01-01    53
#>  8 A1    ML1   2016-01-01    13
#>  9 A1    ML2   2014-01-01    54
#> 10 A1    ML2   2015-01-01    33
#> # … with 12 more rows

Created on 2022-03-28 by the reprex package (v2.0.1)

  • Related