Home > Software design >  Is there a way to loop through an xml file and change specific tags according to how many times that
Is there a way to loop through an xml file and change specific tags according to how many times that

Time:11-25

In my XML file [studentinfo.xml] is there a way to loop through the xml file and change specific tags (and specific child tags) [there will be multiple ones that need to change] and add a number on the end?

**The file is significantly larger

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
    <stu:StudentScreening>
        <st:name>Sam Davies</st:name>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:name>Master Splinter</st:name>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name>Cassie Stone</st:name>
        <st:age>14</st:age>
        <st:hair>Science</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:name>Kelly Clarkson</st:name>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name>Derek Brandon</st:name>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:name>Miley Cyrus</st:name>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

Each tag should be unique for each Student Screening and I want to make them unique by adding a number on the end, see below for desired ouput:

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
    <stu:StudentScreening>
        <st:name0>Sam Davies</st:name0>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:name1>Master Splinter</st:name1>
        <st:name2>Peter Griffin</st:name2>
        <st:name3>Louis Griffin</st:name3>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name0>Cassie Stone</st:name0>
        <st:age>14</st:age>
        <st:hair>Science</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:name1>Kelly Clarkson</st:name1>
        <st:name2>Stewie Griffin</st:name2>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name0>Derek Brandon</st:name0>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:name1>Miley Cyrus</st:name1>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

CodePudding user response:

Well, I guess Harry Potter has got his magic.

The key idea is to use XSLT.

Python code using lxml lib,

import lxml.etree as ET

XSL = '''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:stu="https://www.example.com/harrypotter" xmlns:st="https://www.example.com/harrypotter">
 <xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
 </xsl:template>
 <xsl:template match="stu:StudentScreening/st:name">
        <xsl:element name="st:name{count(preceding-sibling::st:name)}"><xsl:apply-templates select="@*|node()" /></xsl:element>
 </xsl:template>
</xsl:stylesheet>
'''

dom = ET.parse('students.xml')
transform = ET.XSLT(ET.fromstring(XSL))
newdom = transform(dom)
print(ET.tostring(newdom))

newdom.write("out.xml", pretty_print=True)

Your input xml file named students.xml, which must include namespaces,

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown xmlns:stu="https://www.example.com/harrypotter"  xmlns:st="https://www.example.com/harrypotter">
<stu:Studentdata>
    <stu:StudentScreening>
        <st:name>Sam Davies</st:name>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:name>Master Splinter</st:name>
        <st:name>Peter Griffin</st:name>
        <st:name>Louis Griffin</st:name>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name>Cassie Stone</st:name>
        <st:age>14</st:age>
        <st:hair>Science</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:name>Kelly Clarkson</st:name>
        <st:name>Stewie Griffin</st:name>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:name>Derek Brandon</st:name>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:name>Miley Cyrus</st:name>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

Run the Python code, and you should get a file named out.xml.

CodePudding user response:

In bs4, you can assign a new name to Tag simply with Tag.name = 'NEW_NAME', so you just have to enumerate and loop through.

(I pasted your first xml snippet to xmlStr.)

xSoup = BeautifulSoup(xmlStr, 'lxml') ## do NOT use 'xml' parser here unless you want to lose namespaces

enumTags = ['st:name', 'stu:studentscreening']
for d in [c for c in xSoup.descendants if c.name]:
    for name in enumTags:
        for i, t in enumerate(d.find_all(name, recursive=False)):
            t.name = f'{t.name}{i}'

(You didn't number studentscreening in your question, but I wanted to give an example with multiple tags to number; and, setting recursive=False reduces redundancies as it restricts find to direct children only.)

Now, print(xSoup) will give the output

<?xml version="1.0" encoding="UTF-8"?><html><body><stu:studentbreakdown>
<stu:studentdata>
<stu:studentscreening0>
<st:name0>Sam Davies</st:name0>
<st:age>15</st:age>
<st:hair>Black</st:hair>
<st:eyes>Blue</st:eyes>
<st:grade>10</st:grade>
<st:teacher>Draco Malfoy</st:teacher>
<st:dorm>Innovation Hall</st:dorm>
<st:name1>Master Splinter</st:name1>
</stu:studentscreening0>
<stu:studentscreening1>
<st:name0>Cassie Stone</st:name0>
<st:age>14</st:age>
<st:hair>Science</st:hair>
<st:grade>9</st:grade>
<st:teacher>Luna Lovegood</st:teacher>
<st:name1>Kelly Clarkson</st:name1>
</stu:studentscreening1>
<stu:studentscreening2>
<st:name0>Derek Brandon</st:name0>
<st:age>17</st:age>
<st:eyes>green</st:eyes>
<st:teacher>Ron Weasley</st:teacher>
<st:dorm>Hogtie Manor</st:dorm>
<st:name1>Miley Cyrus</st:name1>
</stu:studentscreening2>
</stu:studentdata>
</stu:studentbreakdown>
</body></html>

(You can also save it [to 'x.xml' for example] with with open('x.xml', 'wb') as f: f.write(xSoup.prettify('utf-8')))

  • Related