Home > database >  Reading xlsx with multiple sheets in R for duplication removal
Reading xlsx with multiple sheets in R for duplication removal

Time:11-16

I have a excel file which has multiple sheets embedded in it. My main goal is to basically remove all rows which are appearing multiple times in a single sheet and have to do this for every sheet. I have written the code below but the code is only reading the first sheet and also giving ' ...' in first row and column. Can someone help me out where I might be going wrong. Thank you in advanced

**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1] 
count_of_xl_sheets <- length(config_xl_sheets_name) 
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
  assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
  updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**

CodePudding user response:

this is my approach

library(readxl)
library(data.table)
library(openxlsx)
file.to.read   <- "./testdata.xlsx"
sheets.to.read <- readxl::excel_sheets(file.to.read)

# read sheets from the file to a list and remove duplicate rows
L <- lapply(sheets.to.read, function(x) {
  data <- setDT(readxl::read_excel(file.to.read, sheet = x))
  #remove puplicates
  data[!duplicated(data), ]
  })

# create a new workbook
wb <- createWorkbook()
# create new worksheets an write to them
for (i in seq.int(L)) {
  addWorksheet(wb, sheets.to.read[i])
  writeData(wb, i, L[[i]] )
}
# write the workbook to disk
saveWorkbook(wb, "testdata_new.xlsx")
  • Related