Home > Mobile >  R: Convert all columns to numeric with mutate while maintaining character columns
R: Convert all columns to numeric with mutate while maintaining character columns

Time:06-30

I have a dataframe which stores strings and some of those strings can be interpreted as numbers, however, they are still of class character. I want to automatically convert all columns which can be interpreted as numeric to numeric. I can do this easily with mutate_if, however it produces NAs for every remaining character column. I would like to maintain the original information in those columns.

# Reproducible example
df <- data.frame(Col1 = c("647", "237", "863", "236"),
           Col2 = c("125", "623", "854", "234"),
           Col3 = c("ABC", "BCA", "DFL", "KFD"),
           Col4 = c("PWD", "CDL", "QOW", "DKC"))

df %>% mutate_if(is.character, as.numeric)

  Col1 Col2 Col3 Col4
1  647  125   NA   NA
2  237  623   NA   NA
3  863  854   NA   NA
4  236  234   NA   NA
Warning messages:
1: Problem while computing `..1 = across(, ~as.numeric(.))`.
ℹ NAs introduced by coercion 
2: Problem while computing `..1 = across(, ~as.numeric(.))`.
ℹ NAs introduced by coercion 

Desired output:

# Character strings still available
  Col1 Col2 Col3 Col4
1  647  125  ABC  PWD
2  237  623  BCA  CDL
3  863  854  DFL  QOW
4  236  234  KFD  DKC

str(df)
'data.frame':   4 obs. of  4 variables:
 $ Col1: num  647 237 863 236
 $ Col2: num  125 623 854 234
 $ Col3: chr  "ABC" "BCA" "DFL" "KFD"
 $ Col4: chr  "PWD" "CDL" "QOW" "DKC"

CodePudding user response:

A possible solution:

df <- type.convert(df, as.is = T)
str(df)

#> 'data.frame':    4 obs. of  4 variables:
#>  $ Col1: int  647 237 863 236
#>  $ Col2: int  125 623 854 234
#>  $ Col3: chr  "ABC" "BCA" "DFL" "KFD"
#>  $ Col4: chr  "PWD" "CDL" "QOW" "DKC"

CodePudding user response:

You can check if converting to numeric creates NA values:

df <- df  |>
    mutate(
        across(
            everything(), 
            \(col) ifelse(sum(is.na(as.numeric(col))) == 0, 
                as.numeric(col), 
                col)
        )
    )

df
#   Col1 Col2 Col3 Col4
# 1  647  125  ABC  PWD
# 2  647  125  ABC  PWD
# 3  647  125  ABC  PWD
# 4  647  125  ABC  PWD

sapply(df, class)
#        Col1        Col2        Col3        Col4
#   "numeric"   "numeric" "character" "character"

CodePudding user response:

Another option with readr, very similar to the base R option.

library(readr)

type_convert(df)
-- Column specification ------------------------------------
cols(
  Col1 = col_double(),
  Col2 = col_double(),
  Col3 = col_character(),
  Col4 = col_character()
)

  Col1 Col2 Col3 Col4
1  647  125  ABC  PWD
2  237  623  BCA  CDL
3  863  854  DFL  QOW
4  236  234  KFD  DKC
  • Related