Home > Back-end >  Is it possible to change the class of multiple columns in a dataframe based on their current class?
Is it possible to change the class of multiple columns in a dataframe based on their current class?

Time:08-05

I have a large dataframe with 500 columns with a variety of classes in a variety of orders. Example data:

col1    col2    col3    col4    col5    col6
 2      red      1.2      5       5      5.7
 4     banana    2.3      0       6      5.2
 8      two      2.4      9       8      5.4
 2     first     1.6      7       9      5.9

I am trying to change the class of the columns based on what their current class is. For example, I want too change every column that is currently class "integer" to class "numeric", but leave all the columns that are currently class "character" untouched. Is this possible?

I tried using a for loop, but when I set it up this way, it reads the column classes as "data.frame" instead of "numeric".

for(i in 1:ncol(df)){
  ifelse(class(df[,i])=="integer",
         as.numeric(df[,i]),NA)
}

I also tried to use apply but couldn't figure out how to code the condition correctly or have it loop through only some of my columns without creating a vector specifically naming all the columns with the class type I want changed.

cols <- colnames(df[,which(is.Class("integer"))])
df[,cols] = apply(df[,cols], 2, function(x) as.numeric(x));

Does anyone know if it's possible to do this and if so, how? Thank you!

CodePudding user response:

ifelse is made for vectors and requires that the length (or shape) of the result is the same as the length (or shape) of the input. It works great when you have a vector as input and want a modified vector as output.

In this case, you want to change the class of a column, so we want if(){}else{} not ifelse().

Also, it's much safer to use is.class functions (like is.integer, is.numeric) than class == to check a class--this is because an object can have multiple classes.

Lastly, it can be slightly safer to refer to a single column of a data frame as data[[col]] than data[, col]. The [[ ensures that we get a single column out as a vector, not a 1-column data frame. ("tibbles" and other data.frame-like objects have different behavior with data[, col], but data[[col]] is safe.)

Making those fixes to your for loop:

for(i in 1:ncol(df)) {
  if(is.integer(df[[i]])) {
    df[[i]] <- as.numeric(df[[i]])
  }
}

We could do lapply as well:

int_cols <- sapply(df, is.integer)
df[int_cols] <- lapply(df[int_cols], as.numeric)

CodePudding user response:

The across function in dplyr can be very helpful here. Try something like:

library(dplyr)

mutate(df, across(where(is.integer), as.numeric))
  • Related