Home > OS >  Read a folder of excel files and import individual sheets as separate df's in R with Names
Read a folder of excel files and import individual sheets as separate df's in R with Names

Time:06-22

I have a folder of excel files that contain multiple sheets each. The sheets are named the same in each wb. I'm trying to import one specific named sheet for all excel files as separate data frames. I have been able to import them in; however, the names become df_1, df_2, df_3, etc... I've been trying to take the first word of the excel file name and use that to identify the df.

Example of Excel file Name "AAPL Multiple Sheets" the sheet would be named "Balance" I'm importing as a df. I would like "AAPL Balance df" as the result.

The code that came closest to what I'm looking for located below, however, it names each data frame as df_1, df_2, and so on.

library(purrr)
library(readxl)

files_list <- list.files(path = 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/',
pattern = "*.xlsx",full.names = TRUE)

files_list %>% 
    walk2(1:length(files_list),
          ~ assign(paste0("df_", .y), read_excel(path = .x), envir = globalenv()))

I tried using the file path variable 'file_list' in the past0 function to label them and ended up with,

df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx1, df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx2,

and so on.

I tried to make a list of file names to use. This read the file names and created a list but I couldn't make it work with the code above.

files_Names<-list.files(path='C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/', pattern=NULL, all.files=FALSE, full.names=FALSE)

Which resulted with this, "AAPL Analysis of Data.xlsx" for all the files in the list.

CodePudding user response:

I hope I could reproduce your example without code. I would create a function to have more control for the new filename.

I would suggest:

library(purrr)
library(readxl)
library(openxlsx)

target_folder <- 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data'

files_list <- list.files(path = target_folder,
                         pattern = "*.xlsx", full.names = TRUE)

tease_out <- function(file) {
  data <- read_excel(file, sheet = "Balance")
  filename <- basename(file) %>% tools::file_path_sans_ext()
  new_filename <- paste0(target_folder, "/", fileneame, "Balance df.xlsx")
  
  write.xlsx(data, file = new_filename)
}

map(file_list, tease_out)

Let me know if it works. I assume you are just targeting for the sheet "Balance"?

CodePudding user response:

You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):

library(openxlsx)
library(tidyverse)

Starting with your `files_list` we can do:

# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))

# Create a vector of names based on the first word of the filename   "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ] "), "_Balance_df")

# Assign the names to our list of dfs
names(list_of_dfs) <- df_names

# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
  • Related