I'm dealing with changing formats in R.
I have 2 dataframes:
- The main one
df
- Another dataframe
tmp
which describes columns types ofdf
and the New_format on which columns should be converted
Here is a reproducible example:
df <- data.frame(var1 = c("a", "b", "c"),
var2 = c(1,2,3),
var3 = c("d", "e", "f"))
tmp <- data.frame(Variable = c("var1", "var2", "var3"),
Format = c("character", "numeric", "character"),
New_format = c("character", "integer", "factor"))
I'd like to convert types of columns where New_format is different from Format. I've struggled a lot by using lapply function but did not manage to do it.
It would be really nice if you have any idea :)
Thanks a lot!
CodePudding user response:
You could set up a named mapping between the New_format
values and corresponding as.<value>
function, like this:
funcs <- list("character"= as.character,"integer"=as.integer, "factor"=as.factor)
Then, in a loop, call the function
for(i in 1:nrow(tmp)) {
if(tmp[i,"Format"]!=tmp[i,"New_format"]) {
df[[tmp[i,"Variable"]]] <-funcs[[tmp[i,"New_format"]]](df[[tmp[i,"Variable"]]])
}
}
CodePudding user response:
Use readr::type_convert()
library(tidyverse)
types <- paste(map_chr(tmp$New_format, ~str_sub(., 1,1)), collapse = "")
new_df <- type_convert(df, types, guess_integer = T)
str(new_df)
'data.frame': 3 obs. of 3 variables:
$ var1: chr "a" "b" "c"
$ var2: int 1 2 3
$ var3: Factor w/ 3 levels "d","e","f": 1 2 3
This function requires that the type specifications are passed in either as a cols()
statement, or as a string with the new column type indicated by a single letter (e.g. "c" for character, "f" for factor, and so on).
So either just rename New_format
labels to their single-letter versions ("c", "i", "f"), or you can use str_sub
and paste
with tmp
to get the first letters (which type_convert
wants for the type argument).
Note: Make sure to set guess_integer = TRUE
, otherwise it will default to type double
even if you ask for integer.