I am trying to write to a CSV
file the results of a query done in an XML
file, but I can't find a way to properly handle the results in a tabular format.
from lxml.etree import Element, XMLParser, parse
from typing import Any
import csv
p = XMLParser(huge_tree=True)
tree = parse("myfile.xml", parser=p)
root = tree.getroot()
result_values = []
result_values = root.findall("MYTAG")
with open('path/to/csv_file', 'w') as f:
csv_reader = csv.reader(f)
for result_value in result_values:
writer = csv.writer(f)
writer.writerows(result_value.items())
With the code above, I get them in this form (with incorrect data and repeated columns as rows):
column1 value
column2 value
column3 value
column1 value
column2 value
column3 value
....
What I need is the following, a csv with columns (originally the XML properties) as columns and rows for each of the matched results:
colum1 column2 column3 ...
value value value
value value value
How can I export this data to a csv file with the format above, without manually defining column/property names?
CodePudding user response:
Since your XML mentioned in comments is attribute-centric, consider parsing all attribute dictionaries using list comprehension. Then, write to CSV via DictWriter
:
with self.input().open() as f:
p = XMLParser(huge_tree=True)
tree = parse(f, parser=p)
root = tree.getroot()
# RETURN LIST OF ATTRIBUTE DICTIONARIES
result_values = [dict(n.attrib) for n in root.findall(".//MYTAG")]
# RETRIEVE UNIQUE KEYS FOR COLUMN HEADERS
keys = list(dict.fromkeys([k for dct in result_values for k in dct]))
with self.output().open(mode="w") as f:
# WRITE TO CSV VIA DICTWRITER
dw = csv.DictWriter(f, fieldnames=keys)
dw.writeheader()
dw.writerows(result_values)