Home > Back-end >  Importing multiple excel files with multiple sheets and creating a new column based on sheet's
Importing multiple excel files with multiple sheets and creating a new column based on sheet's

Time:02-28

I am trying to upload multiple excels files (same excel format) and each of these files has multiple sheets, in nature each excel file has information about a hotel bookings, and each sheet is information for one month (f.i jan, feb, oct) and each file is the information of a different hotel name (f.i hilton, paris, Vac_in) to illustrate let's just say that we have 2 files in working directory, first file represents bookings for the Hilton Hotel and the second file is Paris Hotel, file one (Hilton) has 3 sheets named "jan","feb" and "mar" and so does file 2 (Paris Hotel)

Each sheet (regarding of the file is in.. always has the same column names) so by the end of this process I want all data to be imported into one full dataframe and I trying to add two new colomuns one that will have the name of the excel file and the other the name of the sheet it took the information from something like this:

ID COST "FILE NAME" "SHEET NAME"
A 5 Hilton Jan
X 8 Hilton Jan
B 1 Hilton Feb
C 2 Hilton Mar
R 59 Paris Jan
G 22 Paris Feb
U 10 Paris Mar
W 78 Paris Mar

I will have the above desired result because the first file had the name "Hilton" and had 2 rows of info in sheet one named "Jan", and one row of info in sheet "Feb" in the same way the second file had name "Paris" and had sheets named Jan, Feb and took all rows from them and put it into the full dataframe

I can successfully upload all the information from every file and its sheets using this code:

library(tidyverse)
library(fs)
library(readxl)

path <- fs::dir_ls(choose.dir())

read_all_files_&_sheets <- function(path) {
  path %>%
    excel_sheets() %>% 
    set_names() %>% 
    map_df(read_excel, path = path)
}

data <- path%>% 
  map(read_all_files_)

full_db<-bind_rows(data)

But I cant find a way to mutate and create the new columns "FILE NAME" and "SHEET NAME" I had tried for hours but I only get errors, I will be so thankful if you can kindly help me out or reference a blog or a book where I can learn how to do it, thank you so much

CodePudding user response:

How about changing your function slightly, like this:

read_all_files_<- function(path) {
  path %>% 
    excel_sheets() %>%
    set_names() %>% 
    map_df(~read_excel(.x, path = path) %>% mutate(file=path, sheet=.x))
}

Explanation:

Let's say the first file in your vector of paths (i.e. path[1]) is "<folder_structure>hilton.xlsx". Then, when you pass this string to read_all_files_(), the result of piping through set_names() is a named list of the sheets

  Jan   Feb   Mar 
"Jan" "Feb" "Mar" 

You then pipe these three sheet names to map_df(). While map_df(read_excel, path=path) will return the data for you, you need to adjust the function to additionally mutate the frame that map_df() is going to return. Therefore instead of .f=read_excel, we change this to .f=~read_excel(.x, path=path) %>% mutate(....), where the mutate part is adding those two columns, file and sheet, assigning the value of the path and the sheet, which is .x.

Hope this helps.

  • Related