I have a categorical variable with 169 levels. I want to reduce to manageable 7-10 factors, - "Religion", "Culture & Art", "Education", "Animal Protection", "Emergency", "Environment Protection", "Social Service" etc
I understand that, I can use levels() function to rename all these 169 factors, however I am looking for smart options, such as can i use "Religion" or "Culture" as a filter to group all of them under 1 code?
CodePudding user response:
You could do something like this. See the documentation on str_detect
for additional options.
It's easier for someone to help you if you can supply some useable minimal example data and attempted code per the reproducible example below. Then we can run it and suggest improvements.
library(tidyverse)
data_df <- tribble(
~ label,
"Culture and Arts",
"Education in Japanese",
"Culture and Recreation",
"culture & Environment",
"Environmental Activities",
"Education & research"
)
data_df2 <- data_df |>
mutate(category = case_when(
str_detect(label, "Cultu") ~ "Culture & Arts",
str_detect(label, "Educ") ~ "Education",
str_detect(label, "Environ") ~ "Environment",
TRUE ~ "Other"
) |> factor())
data_df2
#> # A tibble: 6 × 2
#> label category
#> <chr> <fct>
#> 1 Culture and Arts Culture & Arts
#> 2 Education in Japanese Education
#> 3 Culture and Recreation Culture & Arts
#> 4 culture & Environment Environment
#> 5 Environmental Activities Environment
#> 6 Education & research Education
levels(data_df2$category)
#> [1] "Culture & Arts" "Education" "Environment"
Created on 2022-04-23 by the reprex package (v2.0.1)