I have this excel file that I've downloaded from here and I want to read it into a pandas dataframe. If I open the file in Excel, it opens fine, no passwords asked, but if I try to modify it, it does ask for a password. If I try to open the file on pandas with the following code instead:
data_inctl = pd.read_excel(
INCTL_DATA_PATH,
sheet_name="Serie Histórica"
)
It gives me the following error:
XLRDError: Workbook is encrypted
I tried some of the solutions found here but I do not have a password for the file so none of them work.
The data itself is not encrypted since I can open it fine on excel. I can also copy and paste the data into another file from Excel without a problem, but that would not allow me to automate my ETL. Is there a way that I could get this data into a pandas dataframe, even though the file is protected from modification?
CodePudding user response:
Managed to do it after reading this comment from @zyndorsky and finding this package:
>>> import pandas as pd
>>> import io
>>> import msoffcrypto
>>> protected_excel_filepath = '/tmp/INCTL_0222.xls'
>>> unprotected_excel_content = io.BytesIO()
>>> with open(protected_excel_filepath, 'rb') as f:
... msoffcryptoobj = msoffcrypto.OfficeFile(f)
... msoffcryptoobj.load_key(password="VelvetSweatshop")
... unprotected_excel_content = io.BytesIO()
... msoffcryptoobj.decrypt(unprotected_excel_content)
...
>>> data = pd.read_excel(unprotected_excel_content)
>>> print(data)
Unnamed: 0 Unnamed: 1 ... Unnamed: 10 Unnamed: 11
0 NaN NaN ... NaN NaN
1 NaN NaN ... NaN NaN
2 NaN NaN ... 50 km 0.234338
3 NaN NaN ... 400 km 0.229290
4 NaN NaN ... 800 km 0.226535
5 NaN NaN ... 2.400 km 0.224885
6 NaN NaN ... 6.000 km 0.224641
7 NaN NaN ... NaN NaN
...