Home > Back-end >  How to extract information from openpyxl.workbook.properties module to dataframe, string or dictiona
How to extract information from openpyxl.workbook.properties module to dataframe, string or dictiona

Time:08-17

I wrote a script that prints who last modified excel documents in directories and subdirectories using:

xl.load_workbook(“excel_file”).properties.

Output:

/usr/local/lib/python3.7/dist-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)
<openpyxl.packaging.core.DocumentProperties object>
Parameters:
creator='Jay', title=None, description=None, subject=None, identifier=None, language=None, created=datetime.datetime(2021, 2, 19, 16, 30, 16), modified=datetime.datetime(2022, 8, 17, 13, 26, 36, 728070), lastModifiedBy=None, category=None, contentStatus=None, version=None, revision=None, keywords=None, lastPrinted=None

I tried xl.load_workbook(“excel_file”).properties.pickle.dump() but I get an error.

How can I create a dataframe of “DocumentProperties” module, save “properties” to a variable or create a dictionary. Is there any way of storing “DocumentProperties” object so I can get everything at once so I can save it as metadata?

I could not find anything related to my question in the forum. I hope my question makes sense.

Thank you!

CodePudding user response:

Pass the parameters into a dict using vars, then you can use pd.DataFrame.from_dict to create a DataFrame from the parameters. You can use orient="index" to set the index column.

metadata = wb.properties
metadata_dict = vars(metadata)
metadata_df = pd.DataFrame.from_dict(metadata_dict, dtype = "str", 
orient="index")
  • Related