I have a categorical variable with over 1000 levels. I want to group levels together so that I can reduce the dimensionality and just have 5 general level. I want to take the group names and group similar values together.
For example, all levels that contain the word "immune" I want to group into a new group called "immune group". All levels that contain the word "eyes" I want to group into a new group called "eye group", etc.
I've tried str_detect and grepl with little success in R . Any other methods that could efficiently do this?
CodePudding user response:
maybe using case_when
from dplyr with str_detect
. But it would help to have a reproductible example
CodePudding user response:
library(dplyr)
library(stringr)
x = c("immune1","immune2","eyes1","eyes2")
case_when(
str_detect(x,"immune")~"immune group",
str_detect(x,"eyes")~"eye group",
T~NA_character_)