I (a noob) am currently trying to read a directory of .xlsm files into a pandas dataframe, with the intention of merging them all together into one big file. I've done similar tasks in the past with .csv files and had no problems, but this has me at a loss.
I'm currently running this:
import pandas as pd
import glob
import openpyxl
df = [pd.read_excel(filename,engine="openpyxl") for filename in glob.glob(r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets\*.xlsm')]
This solution has worked for me in the past. But here, when I run the above code, i get the following error:
zipfile.BadZipFile: File is not a zip file
Which is confusing me, because the file that I'm trying to access is not a zip file. Granted, there is a zip file with that same name in the same directory, but when I rename the file I'm referencing in my program to distinguish it from the zip file, I get the same error.
Anyone have any ideas? I've lurked for a long time and this is my first question, so apologies if it's not formatted in the proper way. Happy to provide more information as necessary. Thank you in advance!
UPDATE
This was fixed by excluding hidden files in the script, something I was unaware was happening.
path = r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets'
# read all the files with extension .xlsm i.e. excel
filenames = glob.glob(path "\[!~]*.xlsm")
# print('File names:', filenames)
# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()
# for loop to iterate all excel files
for file in filenames:
# using concat for excel files
# after reading them with read_excel()
df = pd.concat(pd.read_excel( file, ["BW_TimeSheet"]), ignore_index=True, sort=False)
df['Username'] = os.path.basename(file)
outputxlsx.append(df)
# appending data of excel files
outputxlsx = outputxlsx.append( df, ignore_index=True)
print('Final Excel sheet now generated at the same location:')
outputxlsx.to_excel(path "/Output.xlsx", index=False)
Thanks everyone for your help!
CodePudding user response:
Please delete the encryption of the file.
engine="openpyxl"
This does not support reading encrypted files.
CodePudding user response:
I refer to this issue.
This problem is related to excel and openpyxl. The best way is trying reading and writing to CSV.