Home > Enterprise >  categoricalToNumeric function in R with indefinite quantity of variables (Variadic function)
categoricalToNumeric function in R with indefinite quantity of variables (Variadic function)

Time:10-22

I would like to do the following with a function:

categoricalToNumeric <- function(data,...) {
    for(i in list(...)) {
      data$i <- as.numeric(as.factor(data$i))
    }
  summary(data)
}

Then call,

categoricalToNumeric(data, 'school', 'sex', 'address', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 'reason', 'nursery', 'internet', 'guardian.x', 'schoolsup.x', 'famsup.x', 'paid.x', 'activities.x', 'higher.x', 'romantic.x', 'guardian.y', 'schoolsup.y', 'famsup.y', 'paid.y', 'activities.y', 'higher.y', 'romantic.y')

Currently, there is no error, but the data variable does not mutate upon the categoricalToNumeric call.

The data: https://archive.ics.uci.edu/ml/machine-learning-databases/00320/student.zip

The setup:

data_mat=read.table("./data/csv/student-mat.csv",sep=";",header=TRUE)
data_por=read.table("./data/csv/student-por.csv",sep=";",header=TRUE)


data=merge(data_mat,data_por,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
print(nrow(data)) # 382 data

head(data,5)

CodePudding user response:

It's very weird but this works. And for convenience, I change ... to colnames

categoricalToNumeric2 <- function(data,...) {
  for(i in colnames(data)) {
    data[i] <- as.numeric(as.factor(data$i))
  }
  summary(data)
}
categoricalToNumeric2(data)

    school           sex             age           address         famsize         Pstatus           Medu            Fedu      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
      Mjob            Fjob           reason         nursery         internet       guardian.x     traveltime.x    studytime.x   
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
   failures.x     schoolsup.x       famsup.x         paid.x       activities.x      higher.x       romantic.x       famrel.x    
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
   freetime.x       goout.x          Dalc.x          Walc.x         health.x       absences.x         G1.x            G2.x      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
      G3.x         guardian.y     traveltime.y    studytime.y      failures.y     schoolsup.y       famsup.y         paid.y     
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
  activities.y      higher.y       romantic.y       famrel.y       freetime.y       goout.y          Dalc.y          Walc.y     
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
    health.y       absences.y         G1.y            G2.y            G3.y      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  

CodePudding user response:

data$i is not a valid way to extract a column in a loop. You may use [[ for single column or [ for multiple. An alternative to for loop is to use lapply.

categoricalToNumeric <- function(data,...) {
  cols <- c(...)
  data[cols] <- lapply(data[cols], function(x) as.numeric(as.factor(x)))
  summary(data)
}

categoricalToNumeric(data, 'school', 'sex', ...rest of the columns)
  • Related