Home > front end >  How to relevel a factor variable with over 500 levels efficiently in R
How to relevel a factor variable with over 500 levels efficiently in R

Time:11-10

I haven't been able to find any answers to this specific problem:

I have a factor variable with over 500 levels, that I need to relevel to just 2 levels (1/0.)

Many of the levels start with the same character string e.g. "Woman's mother or sister:"

Is there a way to use starts_with to relevel all of these levels at the same time, instead of doing one by one as I have been doing with this code:

   levels(DF1$MedicalCondition)[levels(DF1$MedicalCondition) == "Woman's mother or sister: sister"] <- "1"

Any help appreciated, thank you!

CodePudding user response:

tidyselect::starts_with is specifically written for use on column names within dplyr-type functions, but you can use the base R startsWith:

levels(DF1$MedicalCondition)[
  startsWith(levels(DF1$MedicalCondition), "Woman's mother or sister")
] <- "1"

You can also use general regex patterns with grepl or stringr::str_detect, which can be very powerful.

  • Related