Home > Software engineering >  Need to regroup categorical variable into just 5 groups based on string value in R
Need to regroup categorical variable into just 5 groups based on string value in R

Time:12-16

I have a categorical variable with over 1000 levels. I want to group levels together so that I can reduce the dimensionality and just have 5 general level. I want to take the group names and group similar values together.

For example, all levels that contain the word "immune" I want to group into a new group called "immune group". All levels that contain the word "eyes" I want to group into a new group called "eye group", etc.

I've tried str_detect and grepl with little success in R . Any other methods that could efficiently do this?

CodePudding user response:

maybe using case_when from dplyr with str_detect. But it would help to have a reproductible example

CodePudding user response:

library(dplyr)
library(stringr)
x = c("immune1","immune2","eyes1","eyes2")
case_when(
  str_detect(x,"immune")~"immune group",
  str_detect(x,"eyes")~"eye group",
  T~NA_character_)
  • Related