Home > Software design >  Issues about the loop in Rstudio
Issues about the loop in Rstudio

Time:03-02

I have more than 1000 csv files. I would like to combine in a single file, after running some processes. So, I used loop function as follow:

setwd("C:/....") files <- dir(".", pattern = ".csv$") # Get the names of the all csv files in the current directory.

for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) assign(obj_name[i], read_csv(files[i])) }

Until here, it works well.

I tried to concatenate the imported files into a list to manipulate them at once as follow:

command <- paste0("RawList <- list(", paste(obj_name, collapse = ","), ")") eval(parse(text = command))

rm(i, obj_name, command, list = ls(pattern = "^g20")) Ref_com_list = list()

Until here, it still okay. But ...

for (i in 1:length(RawList)) { df <- RawList[[i]] %>% pivot_longer(cols = -A, names_to = "B", values_to = "C") %>% mutate(time_sec = paste(YMD[i], B) %>% ymd_hms())%>% mutate(minute = format(as.POSIXct(B,format="%H:%M:%S"),"%M"))

...(some calculation) Ref_com_list [[i]] <- file_all }

Ref_com_all <- do.call(rbind,Ref_com_list)

At that time, I got the error as follow:

Error: Can't combine A and B <datetime>. Run rlang::last_error() to see where the error occurred.

If I run individual file, it work well. But if I run in for loop, the error showed up. Does anyone could tell me what the problem is?

Thanks a lot in advance.

CodePudding user response:

There is a substantial scope for improvement in your code. Broadly speaking, if you are working in tidyverse you can pass multiple files to read_csv directly. Example:

# Generate some sample files
tmp_dir <- fs::path_temp("some_csv_files")
fs::dir_create(tmp_dir)
for (i in 1:100) {
    readr::write_csv(mtcars, fs::file_temp(pattern = "cars",
                     tmp_dir = tmp_dir, ext = ".csv"))
}

# Actual file reading
dta_cars <- readr::read_csv(
    file = fs::dir_ls(path = tmp_dir, glob = "*.csv"),
    id = "file_path"
)

If you want to keep information on the file origination, using id = "file_path" in read_csv will store the path details in column. This is arguably more efficient than and less error-prone than:

for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) 
     assign(obj_name[i], read_csv(files[i])) }

This is much cleaner and will be faster than growing object via loop. After you would progress with your transformations:

dta_cars %>% ...

CodePudding user response:

try:

library(data.table)

files <- list.files(path = '.', full.names=T, pattern='csv')

files_open <- lapply(files, function(x) fread(x, ...)) # ... for arguments like sep, dec, etc...

big_file <- rbindlist(files_open)

fwrite(big_file, ...) # ... for arguments like sep, dec, path to save data, etc...
  • Related