how to capture logic from case_when in dplyr


I am using case_when() from dplyr to create the following column, result.

z <- tibble(a = c(40, 30, NA), 
       b = c(NA, 20, 10))

z %>%
          mutate(result = case_when(
                    !is.na(a) ~ a,
                    is.na(a) & !is.na(b) ~ b

The above gives the following:

      a     b result
  <dbl> <dbl>  <dbl>
1    40    NA     40
2    30    20     30
3    NA    10     10   

However, I would like to simultaneously create another column, result_logic, which displays where the value in result is pulling from (either a or b). The output would look like this.

      a     b result result_logic
  <dbl> <dbl>  <dbl>        <chr>
1    40    NA     40          a
2    30    20     30          a
3    NA    10     10          b

Is there any way to capture this logic evaluated in case_when()?


CodePudding user response:

Something like the following?


z <- tibble(a = c(40, 30, NA), 
            b = c(NA, 20, 10))

z %>%
  mutate(result = case_when(
    !is.na(a) ~ str_c(a, "a", sep = " "),
    is.na(a) & !is.na(b) ~ str_c(b, "b", sep = " "))) %>% 
  separate(result, into=c("result", "result_logic"), convert = T)

#> # A tibble: 3 × 4
#>       a     b result result_logic
#>   <dbl> <dbl>  <int> <chr>       
#> 1    40    NA     40 a           
#> 2    30    20     30 a           
#> 3    NA    10     10 b

CodePudding user response:

You could possibly reverse the two steps above and get the second to 'simply' choose the selected value. This would involve only one case_when call:


z <- tibble(a = c(40, 30, NA), 
            b = c(NA, 20, 10))

z %>% 
  mutate(result_logic = case_when(
    !is.na(a) ~ "a",
    is.na(a) & !is.na(b) ~ "b"
  result = map2_dbl(row_number(), result_logic, ~ z[[.x, .y]]))

#> # A tibble: 3 x 4
#>       a     b result_logic result
#>   <dbl> <dbl> <chr>         <dbl>
#> 1    40    NA a                40
#> 2    30    20 a                30
#> 3    NA    10 b                10

Created on 2021-12-20 by the reprex package (v2.0.1)

CodePudding user response:

Here is an alternative approach dplyr only:


z %>% 
  mutate(result = case_when(
    !is.na(a) ~ a, 
    is.na(a) & !is.na(b) ~ b),
    across(-result, ~case_when(
    !is.na(.) ~ cur_column()), .names = 'new_{col}'),
    result_logic = coalesce(new_a, new_b), .keep="unused")
  a     b result result_logic
  <dbl> <dbl>  <dbl> <chr>       
1    40    NA     40 a           
2    30    20     30 a           
3    NA    10     10 b  

CodePudding user response:

library(dplyr, warn.conflicts = FALSE)
z <- tibble(a = c(40, 30, NA), 
       b = c(NA, 20, 10))

z %>% 
    result = do.call(coalesce, across(a:b)),
    result_logic = 
        across(a:b, ~ ifelse(is.na(.), NA, cur_column())))
#> # A tibble: 3 × 4
#>       a     b result result_logic
#>   <dbl> <dbl>  <dbl> <chr>       
#> 1    40    NA     40 a           
#> 2    30    20     30 a           
#> 3    NA    10     10 b

Created on 2021-12-20 by the reprex package (v2.0.1)

