Home > Software design >  R: Read multiple sheets from a XLS file and rbind them with skip rows and setnames
R: Read multiple sheets from a XLS file and rbind them with skip rows and setnames

Time:02-11

I have multiple sheets in an excel file and I will like to row bind all of them into one single dataframe. The first sheet has 3 rows that I have to skip, which looks something like this.

unneededrow1
unneededrow2
unneededrow3
Date      Category       CatID       Revenue
1/1/2022  Shop           1           1203
1/1/2022  Online         2           3264
2/1/2022  Shop           1           1423
2/1/2022  Online         2           2464

For Sheet2, Sheet3, Sheet4, and and onwards, I have data without column names, which is something like the following.

3/1/2022  Shop           1           2454
3/1/2022  Online         2           4333
4/1/2022  Shop           1           2234
4/1/2022  Online         2           4565   

My initial approach was to set colnames = FALSE for all sheets and rbind them but this result in mismatch of data types. I have looked up and tried other solutions but still couldn't achieve what I need. Appreciate any help here and thanks in advance.

CodePudding user response:

Lets have files e.g. doc1.xlsx, doc2.xlsx, doc3.xlsx and so on in the current working directory. Then you can the whole table like this:

library(tidyverse)
library(readxl)

tibble(path = list.files(".", ".xlsx")) %>%
  mutate(
    has_header = path == "doc1.xlsx",
    data = path %>% map2(has_header, ~ {
      if (.y) {
        read_excel(.x, skip = 3)
      } else {
        read_excel(.x, col_names = c("Date", "Category", "CatID", "Revenue"))
      }
    })
  ) %>%
  pull(data) %>%
  bind_rows()

If you have multiple sheets in the same file you can do this instead:

library(tidyverse)
library(readxl)

path <- "data.xlsx"
tibble(sheet = excel_sheets(path)) %>%
  mutate(
    has_header = sheet == "Sheet1",
    data = sheet %>% map2(has_header, ~ {
      if (.y) {
        read_excel(path, sheet = .x, skip = 3)
      } else {
        read_excel(path, sheet = .x, col_names = c("Date", "Category", "CatID", "Revenue"))
      }
    })
  ) %>%
  pull(data) %>%
  bind_rows()
  •  Tags:  
  • r
  • Related