I have more than 1000 csv files. I would like to combine in a single file, after running some processes. So, I used loop function as follow:
setwd("C:/....") files <- dir(".", pattern = ".csv$") # Get the names of the all csv files in the current directory.
for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) assign(obj_name[i], read_csv(files[i])) }
Until here, it works well.
I tried to concatenate the imported files into a list to manipulate them at once as follow:
command <- paste0("RawList <- list(", paste(obj_name, collapse = ","), ")") eval(parse(text = command))
rm(i, obj_name, command, list = ls(pattern = "^g20")) Ref_com_list = list()
Until here, it still okay. But ...
for (i in 1:length(RawList)) { df <- RawList[[i]] %>% pivot_longer(cols = -A, names_to = "B", values_to = "C") %>% mutate(time_sec = paste(YMD[i], B) %>% ymd_hms())%>% mutate(minute = format(as.POSIXct(B,format="%H:%M:%S"),"%M"))
...(some calculation) Ref_com_list [[i]] <- file_all }
Ref_com_all <- do.call(rbind,Ref_com_list)
At that time, I got the error as follow:
Error: Can't combine
A
andB
<datetime>. Runrlang::last_error()
to see where the error occurred.
If I run individual file, it work well. But if I run in for loop, the error showed up. Does anyone could tell me what the problem is?
Thanks a lot in advance.
CodePudding user response:
There is a substantial scope for improvement in your code. Broadly speaking, if you are working in tidyverse
you can pass multiple files to read_csv
directly. Example:
# Generate some sample files
tmp_dir <- fs::path_temp("some_csv_files")
fs::dir_create(tmp_dir)
for (i in 1:100) {
readr::write_csv(mtcars, fs::file_temp(pattern = "cars",
tmp_dir = tmp_dir, ext = ".csv"))
}
# Actual file reading
dta_cars <- readr::read_csv(
file = fs::dir_ls(path = tmp_dir, glob = "*.csv"),
id = "file_path"
)
If you want to keep information on the file origination, using id = "file_path"
in read_csv
will store the path details in column. This is arguably more efficient than and less error-prone than:
for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) assign(obj_name[i], read_csv(files[i])) }
This is much cleaner and will be faster than growing object via loop. After you would progress with your transformations:
dta_cars %>% ...
CodePudding user response:
try:
library(data.table)
files <- list.files(path = '.', full.names=T, pattern='csv')
files_open <- lapply(files, function(x) fread(x, ...)) # ... for arguments like sep, dec, etc...
big_file <- rbindlist(files_open)
fwrite(big_file, ...) # ... for arguments like sep, dec, path to save data, etc...