I have a dataframe which stores strings and some of those strings can be interpreted as numbers, however, they are still of class character. I want to automatically convert all columns which can be interpreted as numeric to numeric. I can do this easily with mutate_if, however it produces NAs for every remaining character column. I would like to maintain the original information in those columns.
# Reproducible example
df <- data.frame(Col1 = c("647", "237", "863", "236"),
Col2 = c("125", "623", "854", "234"),
Col3 = c("ABC", "BCA", "DFL", "KFD"),
Col4 = c("PWD", "CDL", "QOW", "DKC"))
df %>% mutate_if(is.character, as.numeric)
Col1 Col2 Col3 Col4
1 647 125 NA NA
2 237 623 NA NA
3 863 854 NA NA
4 236 234 NA NA
Warning messages:
1: Problem while computing `..1 = across(, ~as.numeric(.))`.
ℹ NAs introduced by coercion
2: Problem while computing `..1 = across(, ~as.numeric(.))`.
ℹ NAs introduced by coercion
Desired output:
# Character strings still available
Col1 Col2 Col3 Col4
1 647 125 ABC PWD
2 237 623 BCA CDL
3 863 854 DFL QOW
4 236 234 KFD DKC
str(df)
'data.frame': 4 obs. of 4 variables:
$ Col1: num 647 237 863 236
$ Col2: num 125 623 854 234
$ Col3: chr "ABC" "BCA" "DFL" "KFD"
$ Col4: chr "PWD" "CDL" "QOW" "DKC"
CodePudding user response:
A possible solution:
df <- type.convert(df, as.is = T)
str(df)
#> 'data.frame': 4 obs. of 4 variables:
#> $ Col1: int 647 237 863 236
#> $ Col2: int 125 623 854 234
#> $ Col3: chr "ABC" "BCA" "DFL" "KFD"
#> $ Col4: chr "PWD" "CDL" "QOW" "DKC"
CodePudding user response:
You can check if converting to numeric creates NA
values:
df <- df |>
mutate(
across(
everything(),
\(col) ifelse(sum(is.na(as.numeric(col))) == 0,
as.numeric(col),
col)
)
)
df
# Col1 Col2 Col3 Col4
# 1 647 125 ABC PWD
# 2 647 125 ABC PWD
# 3 647 125 ABC PWD
# 4 647 125 ABC PWD
sapply(df, class)
# Col1 Col2 Col3 Col4
# "numeric" "numeric" "character" "character"
CodePudding user response:
Another option with readr
, very similar to the base R option.
library(readr)
type_convert(df)
-- Column specification ------------------------------------
cols(
Col1 = col_double(),
Col2 = col_double(),
Col3 = col_character(),
Col4 = col_character()
)
Col1 Col2 Col3 Col4
1 647 125 ABC PWD
2 237 623 BCA CDL
3 863 854 DFL QOW
4 236 234 KFD DKC