Converting unusual XML file to CSV using Python-CodePudding

I'm having an issue with my XML file. I would like to achieve the same as in: https://www.delftstack.com/howto/python/xml-to-csv-python/

However, my XML file looks a bit different, for example:

<students>
<student name="Rick Grimes" rollnumber="1" age="15"/>
<student name="Lori Grimes" rollnumber="2" age="16"/>
<student name="Judith Grimes" rollnumber="4" age="13"/>
</students>

The code specified in the link does not work with this formatting.

from xml.etree import ElementTree

tree = ElementTree.parse("input.xml")
root = tree.getroot()

for student in root:
    name = student.find("name").text
    roll_number = student.find("rollnumber").text
    age = student.find("age").text
    print(f"{name},{roll_number},{age}")

I have very little coding experience, so hoping someone on here can help me out.

Expected result:

Rick Grimes,1,15 Lori Grimes,2,16 Carl Grimes,3,14 Judith Grimes,4,13

Actual result:

AttributeError: 'NoneType' object has no attribute 'text'

CodePudding user response：

text refers to the actual text of the tag. To make it clear:

<student> text here </student>

You don't have any here since your tags are autoclosing. What you are looking for is the tag attribute attrib: doc here

Something like this should help you get what you're looking for:

for student in root:
    print(student.attrib)

CodePudding user response：

You cannot get the text if there aren't any text to get. Instead you want to use .attrib[key] as you have the values as attributes.

I have modified your example so that it will work with your XML file.

from xml.etree import ElementTree

tree = ElementTree.parse("input.xml")
root = tree.getroot()

for student in root:
    name = student.attrib["name"]
    roll_number = student.attrib["rollnumber"]
    age = student.attrib["age"]
    print(f"{name},{roll_number},{age}")

I hope this will help you.

CodePudding user response：

import io
from xml.etree import ElementTree

xml_string = """<students>
        <student name="Rick Grimes" rollnumber="1" age="15"/>
        <student name="Lori Grimes" rollnumber="2" age="16"/>
        <student name="Judith Grimes" rollnumber="4" age="13"/>
        </students>"""

file = io.StringIO(xml_string)
tree = ElementTree.parse(file)
root = tree.getroot()

result = ""
for student in root:
    result  = f"{student.attrib['name']},{student.attrib['rollnumber']},{student.attrib['age']} "
print(result)

result

Rick Grimes,1,15 Lori Grimes,2,16 Judith Grimes,4,13

CodePudding user response：

For such easy structured XML you can use also the build in function from pandas in two lines of code:

import pandas as pd

df = pd.read_xml('caroline.xml', xpath='.//student')
csv = df.to_csv('caroline.csv', index=False)

# For visualization only
with open('caroline.csv', 'r') as f:
    lines = f.readlines()

for line in lines:
    print(line)

Output:

name,rollnumber,age
Rick Grimes,1,15
Lori Grimes,2,16
Judith Grimes,4,13

With the option header=False you can also switch off to write the header to the csv file.