r: read_csv, cols(): Specify multiple column types at once-CodePudding

Is it possible to specify multiple column types with one assignment in cols() from read_csv?

Instead of:

read_csv2(my_file,
          col_types = cols(.default = 'i',
                           logi_one = 'l',
                           logi_two = 'l',
                           date_one = 'D',
                           date_two = 'D'))

I want to do something like

read_csv2(my_file,
          col_types = cols(.default = 'i',
                           c(logi_one, logi_two) = 'l',
                           c(date_one, date_two) = 'D'))

CodePudding user response：

Here's a wrapper around readr::cols() that allows you to set types on multiple columns at once.

library(tidyverse)

my_cols <- function(..., .default = col_guess()) {
  dots <- enexprs(...)
  colargs <- flatten_chr(unname(
    imap(dots, ~ {
      colnames <- syms(.x)[-1]
      coltypes <- rep_along(colnames, .y)
      purrr::set_names(coltypes, colnames)
    })
  ))
  cols(!!!colargs, .default = .default)
}

Example use:

set.seed(1)

# write sample .csv file
write_csv2(
  data.frame(
    int_one = sample(1:10, 10),
    logi_one = sample(c(TRUE, FALSE), 10, replace = TRUE),
    date_one = paste0("2022-01-", sample(10:31, 10)),
    int_two = sample(1:10, 10),
    logi_two = sample(c(TRUE, FALSE), 10, replace = TRUE),
    date_two = paste0("2022-02-", sample(10:28, 10))
  ),
  "my_file.csv"
)

read_csv2(
  "my_file.csv",
  col_types = my_cols(
    .default = 'i',
    l = c(logi_one, logi_two),
    D = c(date_one, date_two)
  )
)
#> # A tibble: 10 x 6
#>    int_one logi_one date_one   int_two logi_two date_two  
#>      <int> <lgl>    <date>       <int> <lgl>    <date>    
#>  1       9 TRUE     2022-01-18       1 FALSE    2022-02-15
#>  2       4 TRUE     2022-01-24       4 FALSE    2022-02-16
#>  3       7 TRUE     2022-01-14       3 FALSE    2022-02-19
#>  4       1 TRUE     2022-01-31       6 TRUE     2022-02-28
#>  5       2 TRUE     2022-01-23       2 TRUE     2022-02-17
#>  6       5 FALSE    2022-01-29       7 FALSE    2022-02-23
#>  7       3 FALSE    2022-01-26       5 TRUE     2022-02-11
#>  8      10 FALSE    2022-01-11       8 FALSE    2022-02-22
#>  9       6 FALSE    2022-01-19       9 FALSE    2022-02-25
#> 10       8 TRUE     2022-01-28      10 TRUE     2022-02-20

^{Created on 2022-03-05 by the reprex package (v2.0.1)}

CodePudding user response：

Here is one possibility (though a little complicated and verbose). If you have a list of the columns that you want to change, then we can create a single string for the col_types. From the help for ?read_csv, the col_types argument can take a single string of column shortcuts (e.g., iiDl). Here, I read in the column names, then bind that to the list of columns that need to be changed. Then, I replace any NA with the default type, i, then I collapse all column types into a single string. Then, I use that to define the col_types in read_csv.

library(tidyverse)

col_classes <-
  bind_rows(
    read_csv(my_file, col_types = cols(.default = "c"))[0, ],
    tibble(
      logi_one = 'i',
      logi_two = 'i',
      date_one = 'D',
      date_two = 'l'
    )
  ) %>%
  mutate(across(everything(), ~ replace_na(., "i"))) %>%
  as.character(.[1, ]) %>%
  paste0(., collapse = "")

results <- read_csv(my_file, col_types = col_classes)

However, this obviously would not work for read_csv2. But you could collapse every row back down, like this:

output <-
  data.frame(apply(read_csv(myfile), 1, function(x)
    paste(x, collapse = ",")))

names(output) <- paste(names(results), collapse = ",")