I'd like to transform multiple variables into a discrete form using quantcut.
library(gtools)
library(dplyr)
quantcut(df$var3, q=4, na.rm = TRUE)
Works.
Now I'd like to apply this formula to multiple variables. What I have is something like this:
var_col <- c(var3, var4, var5, var6)
df <- df %>%
mutate(across(all_of(var_col), quantcut(., q=4, na.rm = TRUE, .names = "cut_{col}"))
This yields me the error: "x can't combine year
and country
. The error occurred in group one: year = 1800.
The dataset looks something like this:
country <- c("GER", "ITA", "FRA")
year <- c("1800", "1801", "1802")
var3 <- c(1L, 2L, 3L)
var4 <- c(3L, 4L, 5L)
var5 <- c(6L, 7L, NA)
var6 <- c(8L, 9L, 10)
df <- data.frame(country, year, var3, var4, var5, var6)
Though I should say that with the reprex I tried making I got a different error: "x non-numeric argument to binary operator" so I guess the variable type is different, I'll try and find a way to exactly replicate my error.
CodePudding user response:
Perhaps this is what you're after?:
library(dplyr)
country <- c("GER", "ITA", "FRA")
year <- c("1800", "1801", "1802")
var3 <- c(1L, 2L, 3L)
var4 <- c(3L, 4L, 5L)
var5 <- c(6L, 7L, NA)
var6 <- c(8L, 9L, 10)
df <- data.frame(country, year, var3, var4, var5, var6)
your_func <- function(x){
gtools::quantcut(x, q=4, na.rm = TRUE)
}
df %>%
mutate(across(where(is.numeric), your_func))
The output:
country year var3 var4 var5 var6
1 GER 1800 [1,1.5] [3,3.5] [6,6.25] [8,8.5]
2 ITA 1801 (1.5,2] (3.5,4] (6.75,7] (8.5,9]
3 FRA 1802 (2.5,3] (4.5,5] <NA> (9.5,10]
EDIT
If you need to specify which columns:
var_col <- c("var3", "var4", "var5", "var6")
df %>%
mutate(across(var_col, your_func))
The output is the same as above.
CodePudding user response:
The error occurs because the values of year
and country
are not continuous. The package documentation cleary states that x
has to be a "Continuous variable." For more info use ?quantcut
or visit: https://www.rdocumentation.org/packages/gtools/versions/3.9.2/topics/quantcut
You cold solve this problem for year
by converting it to an integer using as.integer()
. country
however can not be converted to a continues variable without losing information. quantcut()
does not work on factors either. You could try leaving country
out of the mutation if that is an option?