I am trying to reorganise the categories in my variable (var) to make it binary.
Right now the variable contains 11 categories 0-10 and you can see value counts for each category below:
> summary(factor(mydf$var))
0 1 2 3 4 5 6 7 8 9 10
61 59 111 259 277 959 280 259 151 28 53
I want to group the two extremes together so that values 0-2 and 8-9 = 0 and values 3-7 = 1
What's the best way to do this without creating new variables?
CodePudding user response:
A choice using findInterval()
:
x <- 0:10
y <- (findInterval(x, c(3, 8)) == 1)
y
# [1] 0 0 0 1 1 1 1 1 0 0 0
CodePudding user response:
I guess you're categorising the var
column in mydf
.
Stimulated data
library(dplyr)
set.seed(12)
mydf <- data.frame(var = sample(0:10, 1000, replace = T))
summary(factor(mydf$var))
0 1 2 3 4 5 6 7 8 9 10
87 99 87 89 99 85 96 92 99 81 86
Re-group var with mutate
mydf2 <- mydf %>% mutate(var = if_else(var %in% 3:7, 1L, 0L))
Or in base R
mydf2$var <- as.integer(mydf[["var"]] %in% 3:7)
Check output
Count of var == 1
should be 89 99 85 96 92 = 461
.
summary(factor(mydf2$var))
0 1
539 461