Error when using case_when with mutate and functions: getting closest number to zero with NA-CodePudding

I am trying to flag the closest number of days from an accident to when someone got insurance. 0 represents the same day, a negative number represents insurance prior to accident and a positive number represents insurance post accident.

Data

df <- data.frame(id=1:4, accident1=c(-1,3,4, NA), accident2=c(100, -2, NA, NA), accident3=c(-3,1,2, NA))
    
> df
  id accident1 accident2 accident3
1  1        -1       100        -3
2  2         3        -2         1
3  3         4        NA         2
4  4        NA        NA        NA

Code:

library(DescTools)
library(dplyr)

df %>%
  rowwise() %>%
  mutate(magic=
           case_when(
             accident1 <0 |accident2<0 |accident3<0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
             accident1 >0 |accident2>0 | accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
             accident1 > 0 & accident2 > 0 & accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
             accident1 < 0 & accident2 < 0 & accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
             TRUE ~ NA_real_)) %>%
         rowwise() %>% 
  # not working
  mutate(magic= case_when(
             (accident1 >0 |accident2<0 |accident3<0) & (accident1 >0 |accident2>0 | accident3>0) ~ 
               Closest(as.numeric(unlist(c(accident1, accident2, accident3))), 0, na.rm=T), TRUE~magic))

Result without last row in data (all NAs):

# A tibble: 3 × 5
# Rowwise: 
     id accident1 accident2 accident3 magic
  <int>     <dbl>     <dbl>     <dbl> <dbl>
1     1        -1       100        -3    -1
2     2         3        -2         1     1
3     3         4        NA         2     2

However, when I try this with NA in last row:

Error: Problem with `mutate()` column `magic`.
ℹ `magic = case_when(...)`.
ℹ `magic` must be size 1, not 0.
ℹ Did you mean: `magic = list(case_when(...))` ?
ℹ The error occurred in row 4.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Problem with `mutate()` column `magic`.
ℹ `magic = case_when(...)`.
ℹ no non-missing arguments to min; returning Inf
ℹ The warning occurred in row 4.

Any advice on how to get the code to work on row 4 with NAs?

CodePudding user response：

The main reason is the last element which returns numeric(0) because all the elements are NA and we used na.rm = TRUE for Closest. We can prevent that by indexing to return first element and this changes to NA

library(dplyr)
df %>% 
  mutate(magic=
           case_when(
             accident1 <0 |accident2<0 |accident3<0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
             accident1 >0 |accident2>0 | accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
             accident1 > 0 & accident2 > 0 & accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
             accident1 < 0 & accident2 < 0 & accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
             TRUE ~ NA_real_)) %>%
         rowwise() %>%  
  mutate(magic= case_when(
             (accident1 >0 |accident2<0 |accident3<0) & (accident1 >0 |accident2>0 | accident3>0) ~ 
               Closest(as.numeric(unlist(c(accident1, accident2, accident3))), 0, na.rm=TRUE)[1], TRUE~magic))

-output

# A tibble: 4 × 5
# Rowwise: 
     id accident1 accident2 accident3 magic
  <int>     <dbl>     <dbl>     <dbl> <dbl>
1     1        -1       100        -3    -1
2     2         3        -2         1     1
3     3         4        NA         2     2
4     4        NA        NA        NA    NA

It is more easier to understand if we do this only on Closest

> apply(df[-1], 1, function(x) Closest(x, 0, na.rm = TRUE))
[[1]]
accident1 
       -1 

[[2]]
accident3 
        1 

[[3]]
accident3 
        2 

[[4]]
named numeric(0)  ####

The solution does index, so that numeric(0) becomes NA

numeric(0)[1]
[1] NA

CodePudding user response：

I want to provide a little more explanation as to why your initial approach didn't work and @akrun's answer works. First, let's save the results of the first mutate.

interim <- df %>%
  rowwise() %>%
  mutate(
    magic=
      case_when(
        accident1 < 0 | 
          accident2 < 0 | 
          accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
        accident1 >0 | 
          accident2>0 | 
          accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
        accident1 > 0 & 
          accident2 > 0 &
          accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
        accident1 < 0 & 
          accident2 < 0 & 
          accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
        TRUE ~ NA_real_))

The problem occurs because case_when evaluates the lhs and rhs separately. I.e., it calculates the potential values for every case, then selects which one to return. Consequently, every lhs must provide a result for every possible input value, regardless if it's every actually used.

Instead of Closest, take a simple function that always returns the same value.

alwaysComputes <- function(x) 4.2

interim %>% 
  mutate(
    magic = case_when(
      (accident1 >0 |accident2<0 |accident3<0) & 
        (accident1 >0 |accident2>0 | accident3>0) ~ 
        alwaysComputes(magic), 
      TRUE~magic
    )
  )
#> # A tibble: 4 x 5
#> # Rowwise: 
#>      id accident1 accident2 accident3 magic
#>   <int>     <dbl>     <dbl>     <dbl> <dbl>
#> 1     1        -1       100        -3   4.2
#> 2     2         3        -2         1   4.2
#> 3     3         4        NA         2   4.2
#> 4     4        NA        NA        NA  NA

No problems because even for NA the function always returns 4.2. case_when just doesn't return the 4.2 it calculated for row 4.

If we modify the function to only return for non-NA values, we re-create the error:

condComputes <- function(x) if(is.na(x)) stop("problems") else x

interim %>% 
  mutate(
    magic = case_when(
      (accident1 >0 |accident2<0 |accident3<0) & 
        (accident1 >0 |accident2>0 | accident3>0) ~ 
        condComputes(magic), 
      TRUE~magic
    )
  )
#> Error: Problem with `mutate()` input `magic`.
#> x problems
#> i Input `magic` is `case_when(...)`.
#> i The error occurred in row 4.

In your original approach, the lhs for the first case returns a 0 length vector for row 4. Even though this would ultimately be discarded, case_when doesn't know it will be discarded at the time it is calculated and therefore raises the error. @akrun's solution forces either NA to returned or the first/only value (since there are no scalar values in R)