I have multiple sheets in an excel file and I will like to row bind all of them into one single dataframe. The first sheet has 3 rows that I have to skip, which looks something like this.
unneededrow1
unneededrow2
unneededrow3
Date Category CatID Revenue
1/1/2022 Shop 1 1203
1/1/2022 Online 2 3264
2/1/2022 Shop 1 1423
2/1/2022 Online 2 2464
For Sheet2, Sheet3, Sheet4, and and onwards, I have data without column names, which is something like the following.
3/1/2022 Shop 1 2454
3/1/2022 Online 2 4333
4/1/2022 Shop 1 2234
4/1/2022 Online 2 4565
My initial approach was to set colnames = FALSE for all sheets and rbind them but this result in mismatch of data types. I have looked up and tried other solutions but still couldn't achieve what I need. Appreciate any help here and thanks in advance.
CodePudding user response:
Lets have files e.g. doc1.xlsx
, doc2.xlsx
, doc3.xlsx
and so on in the current working directory. Then you can the whole table like this:
library(tidyverse)
library(readxl)
tibble(path = list.files(".", ".xlsx")) %>%
mutate(
has_header = path == "doc1.xlsx",
data = path %>% map2(has_header, ~ {
if (.y) {
read_excel(.x, skip = 3)
} else {
read_excel(.x, col_names = c("Date", "Category", "CatID", "Revenue"))
}
})
) %>%
pull(data) %>%
bind_rows()
If you have multiple sheets in the same file you can do this instead:
library(tidyverse)
library(readxl)
path <- "data.xlsx"
tibble(sheet = excel_sheets(path)) %>%
mutate(
has_header = sheet == "Sheet1",
data = sheet %>% map2(has_header, ~ {
if (.y) {
read_excel(path, sheet = .x, skip = 3)
} else {
read_excel(path, sheet = .x, col_names = c("Date", "Category", "CatID", "Revenue"))
}
})
) %>%
pull(data) %>%
bind_rows()