Regrouping variables-CodePudding

I am trying to reorganise the categories in my variable (var) to make it binary.

Right now the variable contains 11 categories 0-10 and you can see value counts for each category below:

> summary(factor(mydf$var))
0    1    2    3    4    5    6    7    8    9   10  
61   59  111  259  277  959  280  259  151   28   53

I want to group the two extremes together so that values 0-2 and 8-9 = 0 and values 3-7 = 1

What's the best way to do this without creating new variables?

CodePudding user response：

A choice using findInterval():

x <- 0:10
y <-  (findInterval(x, c(3, 8)) == 1)

y
# [1] 0 0 0 1 1 1 1 1 0 0 0

CodePudding user response：

I guess you're categorising the var column in mydf.

Stimulated data

library(dplyr)

set.seed(12)
mydf <- data.frame(var = sample(0:10, 1000, replace = T))
summary(factor(mydf$var))
 0  1  2  3  4  5  6  7  8  9 10 
87 99 87 89 99 85 96 92 99 81 86

Re-group var with `mutate`

mydf2 <- mydf %>% mutate(var = if_else(var %in% 3:7, 1L, 0L))

Or in base R

mydf2$var <- as.integer(mydf[["var"]] %in% 3:7)

Check output

Count of var == 1 should be 89 99 85 96 92 = 461.

summary(factor(mydf2$var))
  0   1 
539 461

Stimulated data

Re-group var with mutate

Or in base R

Check output

Re-group var with `mutate`