I have various categorical variable with more than 5 levels each, I want a function that can collapse them into just two levels
column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
"fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
"frank","boss","kate","sall")
df<- data.frame(column1,column2)
So for the data frame above, in the column1, I want to convert all "bad" to "bad" and other levels to "others" with a function. I have no idea how to do that. Thanks
CodePudding user response:
Use an ifelse
or case_when
library(dplyr)
df <- df %>%
mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))
Also, as there is only a single change, we can just do
df$column1[df$column1 != "bad"] <- "others"
CodePudding user response:
A simple way to do this in base R is with indexing:
c('others', 'bad')[(df$column1 == 'bad') 1]
#> [1] "bad" "others" "others" "others" "others" "bad" "bad"
#> [8] "others" "others" "others" "others" "bad"
CodePudding user response:
df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))
CodePudding user response:
Here is dplyr
solution with grouping:
library(dplyr)
df %>%
group_by(group = cumsum(column1=="bad")) %>%
mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>%
ungroup() %>%
select(-group)
column1 column2
<chr> <chr>
1 bad john
2 others ben
3 others cook
4 others seth
5 others brian
6 bad deph
7 bad omar
8 others mary
9 others frank
10 others boss
11 others kate
12 bad sall