I tried to convert the categorical features in a dataset to factor
s. However, using apply
with as.factor
did not work:
convert <- c(2:5, 7:9,11,16:17)
read_file[,convert] <- data.frame(apply(read_file[convert], 2, as.factor))
However, switching to lapply
did work:
read_file[,convert] <- data.frame(lapply(read_file[convert], as.factor))
Can someone explain to me what's the difference and why second code works while the first fails?
CodePudding user response:
apply
returns a matrix and a matrix cannot contain a factor variable. Factor variables are coerced to character variables if you create a matrix from them. The documentation in help("apply")
says:
In all cases the result is coerced by
as.vector
to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.
lapply
returns a list and a list can contain (almost) anything. In fact, a data.frame is just a list with some additional attributes. You don't even need to call data.frame
there. You can just subset-assign a list into a data.frame.