Home > Enterprise >  Reading encrypted file with pandas
Reading encrypted file with pandas

Time:03-30

I have this excel file that I've downloaded from here and I want to read it into a pandas dataframe. If I open the file in Excel, it opens fine, no passwords asked, but if I try to modify it, it does ask for a password. If I try to open the file on pandas with the following code instead:

data_inctl = pd.read_excel(
    INCTL_DATA_PATH,
    sheet_name="Serie Histórica"
)

It gives me the following error:

XLRDError: Workbook is encrypted

I tried some of the solutions found here but I do not have a password for the file so none of them work.

The data itself is not encrypted since I can open it fine on excel. I can also copy and paste the data into another file from Excel without a problem, but that would not allow me to automate my ETL. Is there a way that I could get this data into a pandas dataframe, even though the file is protected from modification?

CodePudding user response:

Managed to do it after reading this comment from @zyndorsky and finding this package:

>>> import pandas as pd
>>> import io
>>> import msoffcrypto
>>> protected_excel_filepath = '/tmp/INCTL_0222.xls'
>>> unprotected_excel_content = io.BytesIO()
>>> with open(protected_excel_filepath, 'rb') as f:
...     msoffcryptoobj = msoffcrypto.OfficeFile(f)
...     msoffcryptoobj.load_key(password="VelvetSweatshop")
...     unprotected_excel_content = io.BytesIO()
...     msoffcryptoobj.decrypt(unprotected_excel_content)
...
>>> data = pd.read_excel(unprotected_excel_content)
>>> print(data)
    Unnamed: 0  Unnamed: 1  ... Unnamed: 10  Unnamed: 11
0          NaN         NaN  ...         NaN          NaN
1          NaN         NaN  ...         NaN          NaN
2          NaN         NaN  ...       50 km     0.234338
3          NaN         NaN  ...      400 km     0.229290
4          NaN         NaN  ...      800 km     0.226535
5          NaN         NaN  ...    2.400 km     0.224885
6          NaN         NaN  ...    6.000 km     0.224641
7          NaN         NaN  ...         NaN          NaN
...
  • Related