Home > Software engineering >  R: Mutate across multiple variables with quantcut
R: Mutate across multiple variables with quantcut

Time:12-17

I'd like to transform multiple variables into a discrete form using quantcut.

library(gtools)
library(dplyr)

quantcut(df$var3, q=4, na.rm = TRUE) 

Works.

Now I'd like to apply this formula to multiple variables. What I have is something like this:

var_col <- c(var3, var4, var5, var6) 
df <- df %>% 
     mutate(across(all_of(var_col), quantcut(., q=4, na.rm = TRUE, .names = "cut_{col}"))

This yields me the error: "x can't combine year and country . The error occurred in group one: year = 1800.

The dataset looks something like this:

country <- c("GER", "ITA", "FRA") 
year <- c("1800", "1801", "1802") 
var3 <- c(1L, 2L, 3L) 
var4 <- c(3L, 4L, 5L) 
var5 <- c(6L, 7L, NA) 
var6 <- c(8L, 9L, 10) 
df <- data.frame(country, year, var3, var4, var5, var6) 

Though I should say that with the reprex I tried making I got a different error: "x non-numeric argument to binary operator" so I guess the variable type is different, I'll try and find a way to exactly replicate my error.

CodePudding user response:

Perhaps this is what you're after?:

library(dplyr)

country <- c("GER", "ITA", "FRA") 
year <- c("1800", "1801", "1802") 
var3 <- c(1L, 2L, 3L) 
var4 <- c(3L, 4L, 5L) 
var5 <- c(6L, 7L, NA) 
var6 <- c(8L, 9L, 10) 
df <- data.frame(country, year, var3, var4, var5, var6) 

your_func <- function(x){
  gtools::quantcut(x, q=4, na.rm = TRUE)
}

df %>% 
  mutate(across(where(is.numeric), your_func))

The output:

  country year    var3    var4     var5     var6
1     GER 1800 [1,1.5] [3,3.5] [6,6.25]  [8,8.5]
2     ITA 1801 (1.5,2] (3.5,4] (6.75,7]  (8.5,9]
3     FRA 1802 (2.5,3] (4.5,5]     <NA> (9.5,10]

EDIT

If you need to specify which columns:

var_col <- c("var3", "var4", "var5", "var6") 

df %>% 
  mutate(across(var_col, your_func))

The output is the same as above.

CodePudding user response:

The error occurs because the values of year and country are not continuous. The package documentation cleary states that x has to be a "Continuous variable." For more info use ?quantcut or visit: https://www.rdocumentation.org/packages/gtools/versions/3.9.2/topics/quantcut

You cold solve this problem for year by converting it to an integer using as.integer(). country however can not be converted to a continues variable without losing information. quantcut() does not work on factors either. You could try leaving country out of the mutation if that is an option?

  • Related