Home > Net >  Load multiple RData files efficiently
Load multiple RData files efficiently

Time:09-02

I have a question that concerns reading files into R. I have a folder with many files that I would like to have in a dataframe in R to work with. What is the most efficient way to read in a large amount of data (over 1000 files)? My code is below, it has been running for a day now and still not all files are read in.

data = data.frame()
  
  for(file in files) {
    path = paste0("Data/",file,".RData") 
    if(file.exists(path)) {
      load(path) # as file_data
      data = dplyr::bind_rows(data, file_data)
    }
  }

CodePudding user response:

You can list all the files and read them into a list. Then bind them at the end.

my_files <- list.files(path = "Data/", pattern = "*.RData", full.names = T)

all_data <- lapply(my_files, load, .GlobalEnv)

bind_rows(mget(unlist(all_data)))

A better solution than using mget is defining a new.env to load the files there and then read them into a list:

my_files <- list.files(path = "Data/", pattern = "*.RData", full.names = T)

temp <- new.env()
lapply(my_files, load, temp)

all_data <- as.list(temp)
rm(temp) 

bind_rows(all_data)

CodePudding user response:

You can use the package fs to make a list of the files then use map_dfr. I use this for csv and excel files but I am presuming it should work the same with load for .Rdata files.

library(fs)
library(tidyverse)

file_list <- dir_ls("Data")
data <- file_list %>% 
  map_dfr(load)
  • Related