I am working with a large dataset and I wish to recode a large number of variables so that they are each divided into 4 equally sized groups.
I can do this to a single variable by using the split_var
function from sjmisc
library, as shown below:
library(sjmisc)
mtcars %>% select(mpg, cyl, disp)
split_var(mtcars, mpg, n = 4)
which produces a new category that represents the group that the particular row is in based on the old value. However, I cannot find a solution to do this across multiple variables. It works if i manually put in the column name for each variable, like shown below:
split_var(mtcars, mpg, cyl, disp, n = 4)
However, as I am working with a large dataset, I need to find a way that I don't have to manually put the names of each row. I have tried the equivalent of split_var(mtcars, c("mpg", "cyl", "disp"), n = 4)
which produces an error:
> split_var(mtcars, c("mpg", "cyl", "disp"), n = 4)
Error: Problem with `mutate()` input `..1`.
ℹ `..1 = c("mpg", "cyl", "disp")`.
ℹ `..1` must be size 32 or 1, not 3.
I think I might need lapply, but I am unaware how to use it in this context. Any help is appreciated!
CodePudding user response:
split_var
uses select_helpers
. So you can do:
mtcars %>%
split_var(everything(), n = 4)
mtcars %>%
split_var(all_of(c("mpg","cyl")), n = 4)
CodePudding user response:
split_var
only works for numeric variables as it uses quantiles, so you can do:
mtcars %>%
sjmisc::split_var(where(is.numeric), n = 4)