Home > OS >  Trying to read a directory of .xlsm files in pandas
Trying to read a directory of .xlsm files in pandas

Time:08-10

I (a noob) am currently trying to read a directory of .xlsm files into a pandas dataframe, with the intention of merging them all together into one big file. I've done similar tasks in the past with .csv files and had no problems, but this has me at a loss.

I'm currently running this:

import pandas as pd
import glob
import openpyxl

df = [pd.read_excel(filename,engine="openpyxl") for filename in glob.glob(r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets\*.xlsm')]

This solution has worked for me in the past. But here, when I run the above code, i get the following error:

zipfile.BadZipFile: File is not a zip file

Which is confusing me, because the file that I'm trying to access is not a zip file. Granted, there is a zip file with that same name in the same directory, but when I rename the file I'm referencing in my program to distinguish it from the zip file, I get the same error.

Anyone have any ideas? I've lurked for a long time and this is my first question, so apologies if it's not formatted in the proper way. Happy to provide more information as necessary. Thank you in advance!

UPDATE

This was fixed by excluding hidden files in the script, something I was unaware was happening.

path = r'\\data\Designer\BI_Development\BI_2022_Objective\BIDataLake\MTT\Automation\TimeTrackingSheets_Automation\TimeTrackingSheets_Automation\TM_TimeTrackingSheets'

# read all the files with extension .xlsm i.e. excel 
filenames = glob.glob(path   "\[!~]*.xlsm")
# print('File names:', filenames)

# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()

# for loop to iterate all excel files
for file in filenames:
   # using concat for excel files
   # after reading them with read_excel()
    df = pd.concat(pd.read_excel( file, ["BW_TimeSheet"]), ignore_index=True, sort=False)
    df['Username'] = os.path.basename(file)
    outputxlsx.append(df)
    

   # appending data of excel files
    outputxlsx = outputxlsx.append( df, ignore_index=True)

print('Final Excel sheet now generated at the same location:')
outputxlsx.to_excel(path "/Output.xlsx", index=False)

Thanks everyone for your help!

CodePudding user response:

Please delete the encryption of the file.

engine="openpyxl"

This does not support reading encrypted files.

CodePudding user response:

I refer to this issue.

This problem is related to excel and openpyxl. The best way is trying reading and writing to CSV.

  • Related