Home > Enterprise >  How to divide into 4 equal groups across multiple columns in R
How to divide into 4 equal groups across multiple columns in R

Time:11-30

I am working with a large dataset and I wish to recode a large number of variables so that they are each divided into 4 equally sized groups.

I can do this to a single variable by using the split_var function from sjmisc library, as shown below:

library(sjmisc)    

mtcars %>% select(mpg, cyl, disp)

split_var(mtcars, mpg, n = 4)

which produces a new category that represents the group that the particular row is in based on the old value. However, I cannot find a solution to do this across multiple variables. It works if i manually put in the column name for each variable, like shown below:

split_var(mtcars, mpg, cyl, disp,  n = 4)

However, as I am working with a large dataset, I need to find a way that I don't have to manually put the names of each row. I have tried the equivalent of split_var(mtcars, c("mpg", "cyl", "disp"), n = 4) which produces an error:

> split_var(mtcars, c("mpg", "cyl", "disp"),  n = 4)
Error: Problem with `mutate()` input `..1`.
ℹ `..1 = c("mpg", "cyl", "disp")`.
ℹ `..1` must be size 32 or 1, not 3.

I think I might need lapply, but I am unaware how to use it in this context. Any help is appreciated!

CodePudding user response:

split_var uses select_helpers. So you can do:

mtcars %>% 
  split_var(everything(), n = 4)

mtcars %>% 
  split_var(all_of(c("mpg","cyl")), n = 4)

CodePudding user response:

split_var only works for numeric variables as it uses quantiles, so you can do:

 mtcars %>%
    sjmisc::split_var(where(is.numeric), n = 4)
  • Related