I am looking for a tidyverse solution to the following problem.
I have the following example dataset:
ex_tbl <- tibble(col1 = c(5, NA, NA, NA), col2 = c(1, 1.3, 0.8, 1.2))
I need to fill in the NAs based on col2 and the previous value in the col1. If I start with the first NA (col1, row 2), and col2 is greater than 1, then I'd multiply the previous col1 value (row 1) by col2 row 2. Then if col2 is less than or equal to 1, then col1 is just equal to the previous value.
This should be the final result:
col1 col2
5 1
6.5 1.3
6.5 0.8
7.8 1.2
I tried case_when()
but am not getting the desired result:
output <- ex_tbl %>% mutate(col1 = case_when(col2 > 1 ~ col1 * col2,
col2 <= 1 ~ col1,
TRUE ~ col1)
Clearly, I'm not capturing the part where I want to use the previous row's value. Any help would be appreciated.
CodePudding user response:
Perhaps you could use
library(dplyr)
ex_tbl %>%
mutate(col1 = first(col1) * cumprod(pmax(col2, 1)))
to get
# A tibble: 4 x 2
col1 col2
<dbl> <dbl>
1 5 1
2 6.5 1.3
3 6.5 0.8
4 7.8 1.2
Edit: Follow-up question
If you want to multiply by a value from a third column in case of col2 < 1
, you could use:
ex_tbl <- tibble(col1 = c(5, NA, NA, NA), col2 = c(1, 1.3, 0.8, 1.2), col3 = c(0.5, 2, 2, 0.3))
ex_tbl %>%
mutate(
factor = (col2 >= 1) * col2 (col2 < 1) * col3,
col1 = first(col1) * cumprod((col2 >= 1) * col2 (col2 < 1) * col3))
Note: the column factor
here isn't necessary. It's just to show what is happening inside the cumprod
function.
CodePudding user response:
We may need accumulate
here
library(dplyr)
library(purrr)
ex_tbl %>%
mutate(coln = accumulate(col2, ~ if(.y <= 1) .x else .x * .y ,
.init = first(col1))[-1])
-output
# A tibble: 4 × 3
col1 col2 coln
<dbl> <dbl> <dbl>
1 5 1 5
2 NA 1.3 6.5
3 NA 0.8 6.5
4 NA 1.2 7.8