I am looking to import a cache of .xlsx files from a local folder. This type of one stop shop import has worked before with newer Excel workbooks, but this current cache consists of large workbooks ( 4MB) from 2020.
When I use the following code, I receive the following error: BadZipFile: File is not a zip file.
However, none of the files are zip files. Could this be an encoding issue?
path = os.getcwd()
files = os.listdir(path)
files
files_xls2 = [f for f in files if f[-14:] == '2020_File.xlsx']
files_xls2
sheet_name2 = '6 Commodities-A'
df2 = pd.DataFrame()
for f in files_xls2:
data2 = pd.read_excel(f, sheet_name2, engine='openpyxl')
CodePudding user response:
xlsx
files are indeed zipped XML files. Your error suggests that your file might be corrupted, or not a valid xlsx file.
CodePudding user response:
Pandas does not work with external connections, which could be why your files are getting corrupted, as you mention yourself.