I have a large dataframe with 500 columns with a variety of classes in a variety of orders. Example data:
col1 col2 col3 col4 col5 col6
2 red 1.2 5 5 5.7
4 banana 2.3 0 6 5.2
8 two 2.4 9 8 5.4
2 first 1.6 7 9 5.9
I am trying to change the class of the columns based on what their current class is. For example, I want too change every column that is currently class "integer" to class "numeric", but leave all the columns that are currently class "character" untouched. Is this possible?
I tried using a for loop, but when I set it up this way, it reads the column classes as "data.frame" instead of "numeric".
for(i in 1:ncol(df)){
ifelse(class(df[,i])=="integer",
as.numeric(df[,i]),NA)
}
I also tried to use apply but couldn't figure out how to code the condition correctly or have it loop through only some of my columns without creating a vector specifically naming all the columns with the class type I want changed.
cols <- colnames(df[,which(is.Class("integer"))])
df[,cols] = apply(df[,cols], 2, function(x) as.numeric(x));
Does anyone know if it's possible to do this and if so, how? Thank you!
CodePudding user response:
ifelse
is made for vectors and requires that the length (or shape) of the result is the same as the length (or shape) of the input. It works great when you have a vector as input and want a modified vector as output.
In this case, you want to change the class of a column, so we want if(){}else{}
not ifelse()
.
Also, it's much safer to use is.class
functions (like is.integer
, is.numeric
) than class ==
to check a class--this is because an object can have multiple classes.
Lastly, it can be slightly safer to refer to a single column of a data frame as data[[col]]
than data[, col]
. The [[
ensures that we get a single column out as a vector, not a 1-column data frame. ("tibbles" and other data.frame-like objects have different behavior with data[, col]
, but data[[col]]
is safe.)
Making those fixes to your for
loop:
for(i in 1:ncol(df)) {
if(is.integer(df[[i]])) {
df[[i]] <- as.numeric(df[[i]])
}
}
We could do lapply
as well:
int_cols <- sapply(df, is.integer)
df[int_cols] <- lapply(df[int_cols], as.numeric)
CodePudding user response:
The across
function in dplyr
can be very helpful here. Try something like:
library(dplyr)
mutate(df, across(where(is.integer), as.numeric))