Home > OS >  Writing parsed XML results to CSV
Writing parsed XML results to CSV

Time:04-18

I am trying to write to a CSV file the results of a query done in an XML file, but I can't find a way to properly handle the results in a tabular format.

from lxml.etree import Element, XMLParser, parse
from typing import Any
import csv

p = XMLParser(huge_tree=True)
tree = parse("myfile.xml", parser=p)
root = tree.getroot()

result_values = []
result_values = root.findall("MYTAG")

with open('path/to/csv_file', 'w') as f:
    csv_reader = csv.reader(f)
    for result_value in result_values:
        writer = csv.writer(f)
        writer.writerows(result_value.items())

With the code above, I get them in this form (with incorrect data and repeated columns as rows):

column1 value
column2 value
column3 value
column1 value
column2 value
column3 value
....

What I need is the following, a csv with columns (originally the XML properties) as columns and rows for each of the matched results:

colum1 column2 column3 ...
value   value   value
value   value   value

How can I export this data to a csv file with the format above, without manually defining column/property names?

CodePudding user response:

Since your XML mentioned in comments is attribute-centric, consider parsing all attribute dictionaries using list comprehension. Then, write to CSV via DictWriter:

with self.input().open() as f: 
    p = XMLParser(huge_tree=True) 
    tree = parse(f, parser=p) 
    root = tree.getroot() 

    # RETURN LIST OF ATTRIBUTE DICTIONARIES
    result_values = [dict(n.attrib) for n in root.findall(".//MYTAG")]

    # RETRIEVE UNIQUE KEYS FOR COLUMN HEADERS
    keys = list(dict.fromkeys([k for dct in result_values for k in dct]))

with self.output().open(mode="w") as f:
    # WRITE TO CSV VIA DICTWRITER
    dw = csv.DictWriter(f, fieldnames=keys)
    dw.writeheader()
    dw.writerows(result_values)
  • Related