Home > Enterprise >  How to format certain columns as numeric with dplyr
How to format certain columns as numeric with dplyr

Time:08-07

I have imported an excel spreadsheet into R and the data frame has numerous columns that should be numeric. I can format a named column as numeric as follows:

df$quantity <- as.numeric(df$quantity)

How would I do this for certain named columns? Here's an example data frame, though it doesn't have anywhere near as many columns as the real thing. Ideally the answer would use dplyr.

cols.to.format <- c("quantity", "li_hep", "edta")

df <- structure(list(source = c("Biobank", "Biobank", "Biobank", "Biobank", 
"Biobank"), sample_type = c("EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot", 
"EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot"
), quantity = c("10", "3", "8", "0", "7"), li_hep = c("0", "0", 
"0", "0", "0"), edta = c("2", "2", "0", "0", "0")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

Using across and all_of you could do;

library(dplyr, warn = FALSE)

cols.to.format <- c("quantity", "li_hep", "edta")

df %>%
  mutate(across(all_of(cols.to.format), as.numeric))
#> # A tibble: 5 × 5
#>   source  sample_type               quantity li_hep  edta
#>   <chr>   <chr>                        <dbl>  <dbl> <dbl>
#> 1 Biobank EDTA Plasma Large Aliquot       10      0     2
#> 2 Biobank EDTA Plasma Large Aliquot        3      0     2
#> 3 Biobank EDTA Plasma Large Aliquot        8      0     0
#> 4 Biobank EDTA Plasma Large Aliquot        0      0     0
#> 5 Biobank EDTA Plasma Large Aliquot        7      0     0

CodePudding user response:

I would use a loop for this:

for (col in cols.to.format) {
  df[[col]] <- as.numeric(df[[col]])
}

CodePudding user response:

Here's a solution if you don't know in advance which columns should be formatted as numeric (so it spares you the effort of sieving through your dataframe and jotting down all relevant columns names):

library(dplyr)
library(stringr)
df %>%
  mutate(across(where(~any(str_detect(., "^\\d $"))), as.numeric)) 
# A tibble: 5 × 5
  source  sample_type               quantity li_hep  edta
  <chr>   <chr>                        <dbl>  <dbl> <dbl>
1 Biobank EDTA Plasma Large Aliquot       10      0     2
2 Biobank EDTA Plasma Large Aliquot        3      0     2
3 Biobank EDTA Plasma Large Aliquot        8      0     0
4 Biobank EDTA Plasma Large Aliquot        0      0     0
5 Biobank EDTA Plasma Large Aliquot        7      0     0

The regex ^\\d $ in str_detect asserts the values must be numeric from string start ^ to string end $.

CodePudding user response:

You can format the named columns with mutate_at:

library(dplyr)
df %>%
  mutate_at(
    .vars = cols.to.format,
    .funs = as.numeric
  )
  • Related