I would like to do the following with a function:
categoricalToNumeric <- function(data,...) {
for(i in list(...)) {
data$i <- as.numeric(as.factor(data$i))
}
summary(data)
}
Then call,
categoricalToNumeric(data, 'school', 'sex', 'address', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 'reason', 'nursery', 'internet', 'guardian.x', 'schoolsup.x', 'famsup.x', 'paid.x', 'activities.x', 'higher.x', 'romantic.x', 'guardian.y', 'schoolsup.y', 'famsup.y', 'paid.y', 'activities.y', 'higher.y', 'romantic.y')
Currently, there is no error, but the data variable does not mutate upon the categoricalToNumeric
call.
The data: https://archive.ics.uci.edu/ml/machine-learning-databases/00320/student.zip
The setup:
data_mat=read.table("./data/csv/student-mat.csv",sep=";",header=TRUE)
data_por=read.table("./data/csv/student-por.csv",sep=";",header=TRUE)
data=merge(data_mat,data_por,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
print(nrow(data)) # 382 data
head(data,5)
CodePudding user response:
It's very weird but this works. And for convenience, I change ...
to colnames
categoricalToNumeric2 <- function(data,...) {
for(i in colnames(data)) {
data[i] <- as.numeric(as.factor(data$i))
}
summary(data)
}
categoricalToNumeric2(data)
school sex age address famsize Pstatus Medu Fedu
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
Mjob Fjob reason nursery internet guardian.x traveltime.x studytime.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
failures.x schoolsup.x famsup.x paid.x activities.x higher.x romantic.x famrel.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
freetime.x goout.x Dalc.x Walc.x health.x absences.x G1.x G2.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
G3.x guardian.y traveltime.y studytime.y failures.y schoolsup.y famsup.y paid.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
activities.y higher.y romantic.y famrel.y freetime.y goout.y Dalc.y Walc.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
health.y absences.y G1.y G2.y G3.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
CodePudding user response:
data$i
is not a valid way to extract a column in a loop. You may use [[
for single column or [
for multiple. An alternative to for
loop is to use lapply
.
categoricalToNumeric <- function(data,...) {
cols <- c(...)
data[cols] <- lapply(data[cols], function(x) as.numeric(as.factor(x)))
summary(data)
}
categoricalToNumeric(data, 'school', 'sex', ...rest of the columns)