Home > OS >  Is there a way to parse my xml file displaying only tags and value?
Is there a way to parse my xml file displaying only tags and value?

Time:12-01

In my XML file [studentinfo.xml] is there a way to loop through the xml file and only output each tag and the value? I would like child tags to be displayed as well. Below breaks everything down. I am open to other solutions as well.

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
    <stu:StudentScreening>
        <st:name>Sam Davies</st:name>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:name>Master Splinter</st:name>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name>Cassie Stone</st:name>
        <st:age>14</st:age>
        <st:hair>Science</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:name>Kelly Clarkson</st:name>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name>Derek Brandon</st:name>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:name>Miley Cyrus</st:name>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

Below is my desired output:

stu:StudentBreakdown : 
stu:Studentdata : 
  stu:StudentScreening : 
    st:name : Sam Davies
    st:age : 15
    st:hair : Black
    st:eyes : Blue
    st:grade : 10
    st:teacher : Draco Malfoy
    st:dorm : Innovation Hall
    st:name : Master Splinter

..etc

Below is my current code:

import pandas as pd
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup

mytree = ET.parse('path\studentinfo.xml').getroot()

list = []
for elm in mytree.iter():
  list.append(elm.tag   ' : '   str(elm.text))
  print(list)

CodePudding user response:

If I add <stu:StudentBreakdown xmlns:stu= "stu" xmlns:st="st"> to your XML root element, I get with:

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('ns.xml')
root= tree.getroot()

columns= ["TAG", "VALUE"]
data = []
for stud in root.iter():
    if "\n" not in stud.text:
        stud.text = stud.text
    else:
        stud.text = None
    row = (stud.tag , stud.text)
    data.append(row)
       
df = pd.DataFrame(data, columns=columns)
print(df)

Output:

                      TAG            VALUE
0   {stu}StudentBreakdown             None
1        {stu}Studentdata             None
2   {stu}StudentScreening             None
3                {st}name       Sam Davies
4                 {st}age               15
5                {st}hair            Black
6                {st}eyes             Blue
7               {st}grade               10
8             {st}teacher     Draco Malfoy
9                {st}dorm  Innovation Hall
10               {st}name  Master Splinter
11  {stu}StudentScreening             None
12               {st}name     Cassie Stone
13                {st}age               14
14               {st}hair          Science
15              {st}grade                9
16            {st}teacher    Luna Lovegood
17               {st}name   Kelly Clarkson
18  {stu}StudentScreening             None
19               {st}name    Derek Brandon
20                {st}age               17
21               {st}eyes            green
22            {st}teacher      Ron Weasley
23               {st}dorm     Hogtie Manor
24               {st}name      Miley Cyrus

Maybe there is a better way to manage the nested XML namespace definition.

  • Related