I have to run hundreds of data checks every day and combine the results into a single spreadsheet so errors can be fixed. I'm attempting to write some python code that will do this automatically, but I also want to exclude any files that have no results. To make this a little more complicated, every excel file has headers, even if the associated SQL queries brings back no results.
This is what I have so far:
from pathlib import Path
import time
import xlwings as xw
SOURCE_DIR = [*filepath*]
excel_files = list(Path(SOURCE_DIR).glob('*.csv'))
combined_wb = xw.Book()
t = time.localtime()
timestamp = time.strftime('%Y-%m-%d', t)
for excel_file in excel_files:
wb = xw.Book(excel_file)
for sheet in wb.sheets:
sheet.api.Copy(After=combined_wb.sheets[0].api)
wb.close()
combined_wb.save([*filepath*]dailychecks_{timestamp}.xlsx")
if len(combined_wb.app.books) == 1:
combined_wb.app.quit()
else:
combined_wb.close()
This code combines everything in the given folder into one excel workbook, but I can't figure out how to skip or ignore the empty files. Any help or advice would be greatly appreciated.
CodePudding user response:
This is one way of checking if the sheet is empty:
for excel_file in excel_files:
wb = xw.Book(excel_file)
for sheet in wb.sheets:
# if it is not the case that the last cell of the sheet is A1 and A1 is empty
if not (sheet.range("A1").api.SpecialCells(11).Address == "$A$1") & (sheet.range("A1").value == None):
sheet.api.Copy(After=combined_wb.sheets[0].api)
wb.close()
combined_wb.save(f"{SOURCE_DIR}\\dailychecks_{timestamp}.xlsx")
The if not
statement can be broken down into two parts:
- Using the VBA "SpecialCells", is the last cell on the sheet
You can change that by modifying the value of the argument passed to
sheet_name
ofpandas.DataFrame.to_excel
.