Home > front end >  Rename & Reduce multiple similar observations in R
Rename & Reduce multiple similar observations in R

Time:04-24

I have a categorical variable with 169 levels. I want to reduce to manageable 7-10 factors, - "Religion", "Culture & Art", "Education", "Animal Protection", "Emergency", "Environment Protection", "Social Service" etc

enter image description here

I understand that, I can use levels() function to rename all these 169 factors, however I am looking for smart options, such as can i use "Religion" or "Culture" as a filter to group all of them under 1 code?

CodePudding user response:

You could do something like this. See the documentation on str_detect for additional options.

It's easier for someone to help you if you can supply some useable minimal example data and attempted code per the reproducible example below. Then we can run it and suggest improvements.

library(tidyverse)

data_df <- tribble(
  ~ label,
  "Culture and Arts",
  "Education in Japanese",
  "Culture and Recreation",
  "culture & Environment",
  "Environmental Activities",
  "Education & research"
) 

data_df2 <- data_df |>
  mutate(category = case_when(
    str_detect(label, "Cultu")   ~ "Culture & Arts",
    str_detect(label, "Educ")    ~ "Education",
    str_detect(label, "Environ") ~ "Environment",
    TRUE ~ "Other"
  ) |> factor())

data_df2
#> # A tibble: 6 × 2
#>   label                    category      
#>   <chr>                    <fct>         
#> 1 Culture and Arts         Culture & Arts
#> 2 Education in Japanese    Education     
#> 3 Culture and Recreation   Culture & Arts
#> 4 culture & Environment    Environment   
#> 5 Environmental Activities Environment   
#> 6 Education & research     Education

levels(data_df2$category)
#> [1] "Culture & Arts" "Education"      "Environment"

Created on 2022-04-23 by the reprex package (v2.0.1)

  • Related