Home > Software engineering >  Why is my ifelse statement converting factors into numbers?
Why is my ifelse statement converting factors into numbers?

Time:09-27

I'm having a weird issue with mutating variables in dplyr. If I run this code:

diamonds %>% 
  select(cut) %>% 
  table()

I see this tabulation of the factors in the diamonds dataset in R:

cut
     Fair      Good Very Good   Premium     Ideal 
     1610      4906     12082     13791     21551 

However, if I try to change one of the names and leave the rest alone:

diamonds %>% 
  mutate(cut.fix = ifelse(cut == "Fair",
                          "Not Fair at All",
                          cut)) %>% 
  select(cut.fix) %>% 
  table()

It only changes the "fixed" value and everything else gets turned into numeric values:

cut.fix
              2               3               4               5 
           4906           12082           13791           21551 
Not Fair at All 
           1610 

What is the reason for this and how can I fix it?

CodePudding user response:

The warning for if_else() is more informative in this case:

library(tidyverse)

diamonds %>% 
  select(cut) %>% 
  table()
#> .
#>      Fair      Good Very Good   Premium     Ideal 
#>      1610      4906     12082     13791     21551

diamonds %>% 
  mutate(cut.fix = if_else(cut == "Fair",
                           "Not Fair at All",
                           cut)) %>% 
  select(cut.fix) %>% 
  table()
#> Error in `mutate()`:
#> ! Problem while computing `cut.fix = if_else(cut == "Fair", "Not Fair at
#>   All", cut)`.
#> Caused by error in `if_else()`:
#> ! `false` must be a character vector, not a `ordered/factor` object.

The ifelse() function isn't 'type safe' and it can convert/coerce values in catastrophic ways. Using the dplyr if_else() function is safer (errors out in these cases) and you can adjust accordingly, e.g. instead of an ordered factor ("cut") you can convert "cut" to a character:

diamonds %>% 
  mutate(cut.fix = if_else(cut == "Fair",
                           "Not Fair at All",
                           as.character(cut))) %>% 
  select(cut.fix) %>% 
  table()
#> .
#>            Good           Ideal Not Fair at All         Premium       Very Good 
#>            4906           21551            1610           13791           12082

This 'works', but as @RitchieSacramento points out, a better solution is to recode the "cut" variable and retain the factor level information, e.g. using dplyr::recode():

diamonds %>% 
  mutate(cut.fix = recode(cut, "Fair" = "Not Fair at All")) %>% 
  select(cut.fix) %>% 
  table()
#> .
#> Not Fair at All            Good       Very Good         Premium           Ideal 
#>            1610            4906           12082           13791           21551

Or, the solution from @RitchieSacramento's comment above, using forcats::fct_recode():


diamonds %>%
  mutate(cut.fix = fct_recode(cut, "Not fair at All" = "Fair" )) %>%
  select(cut.fix) %>% 
  table()
#> .
#> Not fair at All            Good       Very Good         Premium           Ideal 
#>            1610            4906           12082           13791           21551

Created on 2022-09-27 by the reprex package (v2.0.1)

  • Related