I have imported an excel spreadsheet into R and the data frame has numerous columns that should be numeric. I can format a named column as numeric as follows:
df$quantity <- as.numeric(df$quantity)
How would I do this for certain named columns? Here's an example data frame, though it doesn't have anywhere near as many columns as the real thing. Ideally the answer would use dplyr.
cols.to.format <- c("quantity", "li_hep", "edta")
df <- structure(list(source = c("Biobank", "Biobank", "Biobank", "Biobank",
"Biobank"), sample_type = c("EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot",
"EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot"
), quantity = c("10", "3", "8", "0", "7"), li_hep = c("0", "0",
"0", "0", "0"), edta = c("2", "2", "0", "0", "0")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
Using across
and all_of
you could do;
library(dplyr, warn = FALSE)
cols.to.format <- c("quantity", "li_hep", "edta")
df %>%
mutate(across(all_of(cols.to.format), as.numeric))
#> # A tibble: 5 × 5
#> source sample_type quantity li_hep edta
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Biobank EDTA Plasma Large Aliquot 10 0 2
#> 2 Biobank EDTA Plasma Large Aliquot 3 0 2
#> 3 Biobank EDTA Plasma Large Aliquot 8 0 0
#> 4 Biobank EDTA Plasma Large Aliquot 0 0 0
#> 5 Biobank EDTA Plasma Large Aliquot 7 0 0
CodePudding user response:
I would use a loop for this:
for (col in cols.to.format) {
df[[col]] <- as.numeric(df[[col]])
}
CodePudding user response:
Here's a solution if you don't know in advance which columns should be formatted as numeric (so it spares you the effort of sieving through your dataframe and jotting down all relevant columns names):
library(dplyr)
library(stringr)
df %>%
mutate(across(where(~any(str_detect(., "^\\d $"))), as.numeric))
# A tibble: 5 × 5
source sample_type quantity li_hep edta
<chr> <chr> <dbl> <dbl> <dbl>
1 Biobank EDTA Plasma Large Aliquot 10 0 2
2 Biobank EDTA Plasma Large Aliquot 3 0 2
3 Biobank EDTA Plasma Large Aliquot 8 0 0
4 Biobank EDTA Plasma Large Aliquot 0 0 0
5 Biobank EDTA Plasma Large Aliquot 7 0 0
The regex ^\\d $
in str_detect
asserts the values must be numeric from string start ^
to string end $
.
CodePudding user response:
You can format the named columns with mutate_at:
library(dplyr)
df %>%
mutate_at(
.vars = cols.to.format,
.funs = as.numeric
)