I wrote a script that prints who last modified excel documents in directories and subdirectories using:
xl.load_workbook(“excel_file”).properties
.
Output:
/usr/local/lib/python3.7/dist-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
<openpyxl.packaging.core.DocumentProperties object>
Parameters:
creator='Jay', title=None, description=None, subject=None, identifier=None, language=None, created=datetime.datetime(2021, 2, 19, 16, 30, 16), modified=datetime.datetime(2022, 8, 17, 13, 26, 36, 728070), lastModifiedBy=None, category=None, contentStatus=None, version=None, revision=None, keywords=None, lastPrinted=None
I tried xl.load_workbook(“excel_file”).properties.pickle.dump()
but I get an error.
How can I create a dataframe of “DocumentProperties” module, save “properties” to a variable or create a dictionary. Is there any way of storing “DocumentProperties” object so I can get everything at once so I can save it as metadata?
I could not find anything related to my question in the forum. I hope my question makes sense.
Thank you!
CodePudding user response:
Pass the parameters into a dict using vars, then you can use pd.DataFrame.from_dict to create a DataFrame from the parameters. You can use orient="index" to set the index column.
metadata = wb.properties
metadata_dict = vars(metadata)
metadata_df = pd.DataFrame.from_dict(metadata_dict, dtype = "str",
orient="index")