Home > Enterprise >  Unable to import large cache of .xlsx files - Receive error BadZipFile: File is not a zip file
Unable to import large cache of .xlsx files - Receive error BadZipFile: File is not a zip file

Time:10-05

I am looking to import a cache of .xlsx files from a local folder. This type of one stop shop import has worked before with newer Excel workbooks, but this current cache consists of large workbooks ( 4MB) from 2020.

When I use the following code, I receive the following error: BadZipFile: File is not a zip file.

However, none of the files are zip files. Could this be an encoding issue?

path = os.getcwd()
files = os.listdir(path)
files

files_xls2 = [f for f in files if f[-14:] == '2020_File.xlsx']
files_xls2
sheet_name2 = '6 Commodities-A'

df2 = pd.DataFrame()
for f in files_xls2:
    data2 = pd.read_excel(f, sheet_name2, engine='openpyxl')

CodePudding user response:

xlsx files are indeed zipped XML files. Your error suggests that your file might be corrupted, or not a valid xlsx file.

CodePudding user response:

Pandas does not work with external connections, which could be why your files are getting corrupted, as you mention yourself.

  • Related