Home > front end >  Automatically Find Best Class for Column in Data Frame Based on Values in this Column in R
Automatically Find Best Class for Column in Data Frame Based on Values in this Column in R

Time:12-27

I have a data frame where all columns have the character class. I want to automatically convert the classes of each column to the class that fits the data "best".

Consider the following example data:

data <- data.frame(x1 = letters[1:5],
                   x2 = as.character(1:5),
                   x3 = as.character(seq(0.2, 1, 0.2)))
data
  x1 x2  x3
1  a  1 0.2
2  b  2 0.4
3  c  3 0.6
4  d  4 0.8
5  e  5   1

All columns in our example data have the character class:

sapply(data, class)
#          x1          x2          x3 
# "character" "character" "character"

I could convert each column to the desired class manually. However, for large data sets this might not be efficient.

Is there a way to automatically scan the values in each column and convert the corresponding column to a better class?

In this example, the column x2 contains integers and the column x3 contains numericals. The desired classes would hence look like this:

sapply(data, class)
#          x1          x2          x3 
# "character"   "integer"   "numeric"

CodePudding user response:

Using type.convert(), the as.is=TRUE prevents from coercing characters to factors.

data <- data |> type.convert(as.is=TRUE)

str(data)    
# 'data.frame': 5 obs. of  3 variables:
# $ x1: chr  "a" "b" "c" "d" ...
# $ x2: int  1 2 3 4 5
# $ x3: num  0.2 0.4 0.6 0.8 1

R < 4.1:

data <- type.convert(data, as.is=TRUE)
  • Related