Is it possible to specify multiple column types with one assignment in cols()
from read_csv
?
Instead of:
read_csv2(my_file,
col_types = cols(.default = 'i',
logi_one = 'l',
logi_two = 'l',
date_one = 'D',
date_two = 'D'))
I want to do something like
read_csv2(my_file,
col_types = cols(.default = 'i',
c(logi_one, logi_two) = 'l',
c(date_one, date_two) = 'D'))
CodePudding user response:
Here's a wrapper around readr::cols()
that allows you to set types on multiple columns at once.
library(tidyverse)
my_cols <- function(..., .default = col_guess()) {
dots <- enexprs(...)
colargs <- flatten_chr(unname(
imap(dots, ~ {
colnames <- syms(.x)[-1]
coltypes <- rep_along(colnames, .y)
purrr::set_names(coltypes, colnames)
})
))
cols(!!!colargs, .default = .default)
}
Example use:
set.seed(1)
# write sample .csv file
write_csv2(
data.frame(
int_one = sample(1:10, 10),
logi_one = sample(c(TRUE, FALSE), 10, replace = TRUE),
date_one = paste0("2022-01-", sample(10:31, 10)),
int_two = sample(1:10, 10),
logi_two = sample(c(TRUE, FALSE), 10, replace = TRUE),
date_two = paste0("2022-02-", sample(10:28, 10))
),
"my_file.csv"
)
read_csv2(
"my_file.csv",
col_types = my_cols(
.default = 'i',
l = c(logi_one, logi_two),
D = c(date_one, date_two)
)
)
#> # A tibble: 10 x 6
#> int_one logi_one date_one int_two logi_two date_two
#> <int> <lgl> <date> <int> <lgl> <date>
#> 1 9 TRUE 2022-01-18 1 FALSE 2022-02-15
#> 2 4 TRUE 2022-01-24 4 FALSE 2022-02-16
#> 3 7 TRUE 2022-01-14 3 FALSE 2022-02-19
#> 4 1 TRUE 2022-01-31 6 TRUE 2022-02-28
#> 5 2 TRUE 2022-01-23 2 TRUE 2022-02-17
#> 6 5 FALSE 2022-01-29 7 FALSE 2022-02-23
#> 7 3 FALSE 2022-01-26 5 TRUE 2022-02-11
#> 8 10 FALSE 2022-01-11 8 FALSE 2022-02-22
#> 9 6 FALSE 2022-01-19 9 FALSE 2022-02-25
#> 10 8 TRUE 2022-01-28 10 TRUE 2022-02-20
Created on 2022-03-05 by the reprex package (v2.0.1)
CodePudding user response:
Here is one possibility (though a little complicated and verbose). If you have a list of the columns that you want to change, then we can create a single string for the col_types
. From the help for ?read_csv
, the col_types
argument can take a single string of column shortcuts (e.g., iiDl
). Here, I read in the column names, then bind that to the list of columns that need to be changed. Then, I replace any NA
with the default type, i
, then I collapse all column types into a single string. Then, I use that to define the col_types
in read_csv
.
library(tidyverse)
col_classes <-
bind_rows(
read_csv(my_file, col_types = cols(.default = "c"))[0, ],
tibble(
logi_one = 'i',
logi_two = 'i',
date_one = 'D',
date_two = 'l'
)
) %>%
mutate(across(everything(), ~ replace_na(., "i"))) %>%
as.character(.[1, ]) %>%
paste0(., collapse = "")
results <- read_csv(my_file, col_types = col_classes)
However, this obviously would not work for read_csv2
. But you could collapse every row back down, like this:
output <-
data.frame(apply(read_csv(myfile), 1, function(x)
paste(x, collapse = ",")))
names(output) <- paste(names(results), collapse = ",")