Home > database >  Categorize categorical data R
Categorize categorical data R

Time:04-26

I am trying to categorize a column of categorical data

Reprex:
a <- c("DN974", "DN469B", "DN469W;DN469E", "DN80", "EZDH01", "DN971")
df <- data.frame(a)
df <- mutate(df, a=if_else((a=="DN974" | a=="DN469B"), "a", 
                          (if_else(a=="DN469W;DN469E" | a=="DN80"), "b", "c")))


I am trying with the if_else function, but i fail to make it work. I get the error
Error: unexpected ',' in:
"df <- mutate(df, a=if_else((a=="DN974" | a=="DN469B"), "a", 
                          (if_else(a=="DN469W;DN469E" | a=="DN80"),"
 current        |desired 
| DN974         | a   | 
| DN469B        | a   |
| DN469W;DN469E"| b   |
| DN80          | b   | 
| EZDH01        | c   |
| DN971         | c   |

Am I using the correct function, and is so, what am I doing wrong? Thx

CodePudding user response:

Nested ifelse's are a bad idea because they tend to be unreadable and bugs become easier to occur. Use case_when.

suppressPackageStartupMessages(library(dplyr))

a <- c("DN974", "DN469B", "DN469W;DN469E", "DN80", "EZDH01", "DN971")
df <- data.frame(a)

df <- df %>%
  mutate(a = case_when(
    a %in% c("DN974","DN469B") ~ "a",
    a %in% c("DN469W;DN469E", "DN80") ~ "b",
    TRUE ~ "c"
  ))

df
#>   a
#> 1 a
#> 2 a
#> 3 b
#> 4 b
#> 5 c
#> 6 c

Created on 2022-04-25 by the reprex package (v2.0.1)

  • Related