In my XML file [studentinfo.xml] is there a way to loop through the xml file and only output each tag and the value? I would like child tags to be displayed as well. Below breaks everything down. I am open to other solutions as well.
<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
<stu:StudentScreening>
<st:name>Sam Davies</st:name>
<st:age>15</st:age>
<st:hair>Black</st:hair>
<st:eyes>Blue</st:eyes>
<st:grade>10</st:grade>
<st:teacher>Draco Malfoy</st:teacher>
<st:dorm>Innovation Hall</st:dorm>
<st:name>Master Splinter</st:name>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name>Cassie Stone</st:name>
<st:age>14</st:age>
<st:hair>Science</st:hair>
<st:grade>9</st:grade>
<st:teacher>Luna Lovegood</st:teacher>
<st:name>Kelly Clarkson</st:name>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name>Derek Brandon</st:name>
<st:age>17</st:age>
<st:eyes>green</st:eyes>
<st:teacher>Ron Weasley</st:teacher>
<st:dorm>Hogtie Manor</st:dorm>
<st:name>Miley Cyrus</st:name>
</stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>
Below is my desired output:
stu:StudentBreakdown :
stu:Studentdata :
stu:StudentScreening :
st:name : Sam Davies
st:age : 15
st:hair : Black
st:eyes : Blue
st:grade : 10
st:teacher : Draco Malfoy
st:dorm : Innovation Hall
st:name : Master Splinter
..etc
Below is my current code:
import pandas as pd
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
mytree = ET.parse('path\studentinfo.xml').getroot()
list = []
for elm in mytree.iter():
list.append(elm.tag ' : ' str(elm.text))
print(list)
CodePudding user response:
If I add <stu:StudentBreakdown xmlns:stu= "stu" xmlns:st="st">
to your XML root element, I get with:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('ns.xml')
root= tree.getroot()
columns= ["TAG", "VALUE"]
data = []
for stud in root.iter():
if "\n" not in stud.text:
stud.text = stud.text
else:
stud.text = None
row = (stud.tag , stud.text)
data.append(row)
df = pd.DataFrame(data, columns=columns)
print(df)
Output:
TAG VALUE
0 {stu}StudentBreakdown None
1 {stu}Studentdata None
2 {stu}StudentScreening None
3 {st}name Sam Davies
4 {st}age 15
5 {st}hair Black
6 {st}eyes Blue
7 {st}grade 10
8 {st}teacher Draco Malfoy
9 {st}dorm Innovation Hall
10 {st}name Master Splinter
11 {stu}StudentScreening None
12 {st}name Cassie Stone
13 {st}age 14
14 {st}hair Science
15 {st}grade 9
16 {st}teacher Luna Lovegood
17 {st}name Kelly Clarkson
18 {stu}StudentScreening None
19 {st}name Derek Brandon
20 {st}age 17
21 {st}eyes green
22 {st}teacher Ron Weasley
23 {st}dorm Hogtie Manor
24 {st}name Miley Cyrus
Maybe there is a better way to manage the nested XML namespace definition.