Home > OS >  How to collapse levels in a categorical variable in R
How to collapse levels in a categorical variable in R

Time:12-14

I have various categorical variable with more than 5 levels each, I want a function that can collapse them into just two levels

column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
            "fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
            "frank","boss","kate","sall")

df<- data.frame(column1,column2)

So for the data frame above, in the column1, I want to convert all "bad" to "bad" and other levels to "others" with a function. I have no idea how to do that. Thanks

CodePudding user response:

Use an ifelse or case_when

library(dplyr)
df <- df %>% 
   mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))

Also, as there is only a single change, we can just do

df$column1[df$column1 != "bad"] <- "others"

CodePudding user response:

A simple way to do this in base R is with indexing:

c('others', 'bad')[(df$column1 == 'bad')   1]
#> [1] "bad"    "others" "others" "others" "others" "bad"    "bad"   
#> [8] "others" "others" "others" "others" "bad"  

CodePudding user response:

df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))

CodePudding user response:

Here is dplyr solution with grouping:

library(dplyr)
df %>% 
  group_by(group = cumsum(column1=="bad")) %>% 
  mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>% 
  ungroup() %>% 
  select(-group)

  column1 column2
   <chr>   <chr>  
 1 bad     john   
 2 others  ben    
 3 others  cook   
 4 others  seth   
 5 others  brian  
 6 bad     deph   
 7 bad     omar   
 8 others  mary   
 9 others  frank  
10 others  boss   
11 others  kate   
12 bad     sall   
  • Related