I'm having a weird issue with mutating variables in dplyr
. If I run this code:
diamonds %>%
select(cut) %>%
table()
I see this tabulation of the factors in the diamonds
dataset in R:
cut
Fair Good Very Good Premium Ideal
1610 4906 12082 13791 21551
However, if I try to change one of the names and leave the rest alone:
diamonds %>%
mutate(cut.fix = ifelse(cut == "Fair",
"Not Fair at All",
cut)) %>%
select(cut.fix) %>%
table()
It only changes the "fixed" value and everything else gets turned into numeric values:
cut.fix
2 3 4 5
4906 12082 13791 21551
Not Fair at All
1610
What is the reason for this and how can I fix it?
CodePudding user response:
The warning for if_else()
is more informative in this case:
library(tidyverse)
diamonds %>%
select(cut) %>%
table()
#> .
#> Fair Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
diamonds %>%
mutate(cut.fix = if_else(cut == "Fair",
"Not Fair at All",
cut)) %>%
select(cut.fix) %>%
table()
#> Error in `mutate()`:
#> ! Problem while computing `cut.fix = if_else(cut == "Fair", "Not Fair at
#> All", cut)`.
#> Caused by error in `if_else()`:
#> ! `false` must be a character vector, not a `ordered/factor` object.
The ifelse()
function isn't 'type safe' and it can convert/coerce values in catastrophic ways. Using the dplyr if_else()
function is safer (errors out in these cases) and you can adjust accordingly, e.g. instead of an ordered factor ("cut") you can convert "cut" to a character:
diamonds %>%
mutate(cut.fix = if_else(cut == "Fair",
"Not Fair at All",
as.character(cut))) %>%
select(cut.fix) %>%
table()
#> .
#> Good Ideal Not Fair at All Premium Very Good
#> 4906 21551 1610 13791 12082
This 'works', but as @RitchieSacramento points out, a better solution is to recode the "cut" variable and retain the factor level information, e.g. using dplyr::recode()
:
diamonds %>%
mutate(cut.fix = recode(cut, "Fair" = "Not Fair at All")) %>%
select(cut.fix) %>%
table()
#> .
#> Not Fair at All Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
Or, the solution from @RitchieSacramento's comment above, using forcats::fct_recode()
:
diamonds %>%
mutate(cut.fix = fct_recode(cut, "Not fair at All" = "Fair" )) %>%
select(cut.fix) %>%
table()
#> .
#> Not fair at All Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
Created on 2022-09-27 by the reprex package (v2.0.1)