In my XML file [studentinfo.xml] is there a way to loop through the xml file and change specific tags (and specific child tags) [there will be multiple ones that need to change] and add a number on the end?
**The file is significantly larger
<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
<stu:StudentScreening>
<st:name>Sam Davies</st:name>
<st:age>15</st:age>
<st:hair>Black</st:hair>
<st:eyes>Blue</st:eyes>
<st:grade>10</st:grade>
<st:teacher>Draco Malfoy</st:teacher>
<st:dorm>Innovation Hall</st:dorm>
<st:name>Master Splinter</st:name>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name>Cassie Stone</st:name>
<st:age>14</st:age>
<st:hair>Science</st:hair>
<st:grade>9</st:grade>
<st:teacher>Luna Lovegood</st:teacher>
<st:name>Kelly Clarkson</st:name>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name>Derek Brandon</st:name>
<st:age>17</st:age>
<st:eyes>green</st:eyes>
<st:teacher>Ron Weasley</st:teacher>
<st:dorm>Hogtie Manor</st:dorm>
<st:name>Miley Cyrus</st:name>
</stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>
Each tag should be unique for each Student Screening and I want to make them unique by adding a number on the end, see below for desired ouput:
<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
<stu:StudentScreening>
<st:name0>Sam Davies</st:name0>
<st:age>15</st:age>
<st:hair>Black</st:hair>
<st:eyes>Blue</st:eyes>
<st:grade>10</st:grade>
<st:teacher>Draco Malfoy</st:teacher>
<st:dorm>Innovation Hall</st:dorm>
<st:name1>Master Splinter</st:name1>
<st:name2>Peter Griffin</st:name2>
<st:name3>Louis Griffin</st:name3>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name0>Cassie Stone</st:name0>
<st:age>14</st:age>
<st:hair>Science</st:hair>
<st:grade>9</st:grade>
<st:teacher>Luna Lovegood</st:teacher>
<st:name1>Kelly Clarkson</st:name1>
<st:name2>Stewie Griffin</st:name2>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name0>Derek Brandon</st:name0>
<st:age>17</st:age>
<st:eyes>green</st:eyes>
<st:teacher>Ron Weasley</st:teacher>
<st:dorm>Hogtie Manor</st:dorm>
<st:name1>Miley Cyrus</st:name1>
</stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>
CodePudding user response:
Well, I guess Harry Potter has got his magic.
The key idea is to use XSLT.
Python code using lxml lib,
import lxml.etree as ET
XSL = '''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:stu="https://www.example.com/harrypotter" xmlns:st="https://www.example.com/harrypotter">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="stu:StudentScreening/st:name">
<xsl:element name="st:name{count(preceding-sibling::st:name)}"><xsl:apply-templates select="@*|node()" /></xsl:element>
</xsl:template>
</xsl:stylesheet>
'''
dom = ET.parse('students.xml')
transform = ET.XSLT(ET.fromstring(XSL))
newdom = transform(dom)
print(ET.tostring(newdom))
newdom.write("out.xml", pretty_print=True)
Your input xml file named students.xml
, which must include namespaces,
<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown xmlns:stu="https://www.example.com/harrypotter" xmlns:st="https://www.example.com/harrypotter">
<stu:Studentdata>
<stu:StudentScreening>
<st:name>Sam Davies</st:name>
<st:age>15</st:age>
<st:hair>Black</st:hair>
<st:eyes>Blue</st:eyes>
<st:grade>10</st:grade>
<st:teacher>Draco Malfoy</st:teacher>
<st:dorm>Innovation Hall</st:dorm>
<st:name>Master Splinter</st:name>
<st:name>Peter Griffin</st:name>
<st:name>Louis Griffin</st:name>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name>Cassie Stone</st:name>
<st:age>14</st:age>
<st:hair>Science</st:hair>
<st:grade>9</st:grade>
<st:teacher>Luna Lovegood</st:teacher>
<st:name>Kelly Clarkson</st:name>
<st:name>Stewie Griffin</st:name>
</stu:StudentScreening>
<stu:StudentScreening>
<st:name>Derek Brandon</st:name>
<st:age>17</st:age>
<st:eyes>green</st:eyes>
<st:teacher>Ron Weasley</st:teacher>
<st:dorm>Hogtie Manor</st:dorm>
<st:name>Miley Cyrus</st:name>
</stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>
Run the Python code, and you should get a file named out.xml
.
CodePudding user response:
In bs4, you can assign a new name to Tag simply with Tag.name = 'NEW_NAME'
, so you just have to enumerate and loop through.
(I pasted your first xml snippet to xmlStr
.)
xSoup = BeautifulSoup(xmlStr, 'lxml') ## do NOT use 'xml' parser here unless you want to lose namespaces
enumTags = ['st:name', 'stu:studentscreening']
for d in [c for c in xSoup.descendants if c.name]:
for name in enumTags:
for i, t in enumerate(d.find_all(name, recursive=False)):
t.name = f'{t.name}{i}'
(You didn't number studentscreening
in your question, but I wanted to give an example with multiple tags to number; and, setting recursive=False
reduces redundancies as it restricts find
to direct children only.)
Now, print(xSoup)
will give the output
<?xml version="1.0" encoding="UTF-8"?><html><body><stu:studentbreakdown>
<stu:studentdata>
<stu:studentscreening0>
<st:name0>Sam Davies</st:name0>
<st:age>15</st:age>
<st:hair>Black</st:hair>
<st:eyes>Blue</st:eyes>
<st:grade>10</st:grade>
<st:teacher>Draco Malfoy</st:teacher>
<st:dorm>Innovation Hall</st:dorm>
<st:name1>Master Splinter</st:name1>
</stu:studentscreening0>
<stu:studentscreening1>
<st:name0>Cassie Stone</st:name0>
<st:age>14</st:age>
<st:hair>Science</st:hair>
<st:grade>9</st:grade>
<st:teacher>Luna Lovegood</st:teacher>
<st:name1>Kelly Clarkson</st:name1>
</stu:studentscreening1>
<stu:studentscreening2>
<st:name0>Derek Brandon</st:name0>
<st:age>17</st:age>
<st:eyes>green</st:eyes>
<st:teacher>Ron Weasley</st:teacher>
<st:dorm>Hogtie Manor</st:dorm>
<st:name1>Miley Cyrus</st:name1>
</stu:studentscreening2>
</stu:studentdata>
</stu:studentbreakdown>
</body></html>
(You can also save it [to 'x.xml' for example] with with open('x.xml', 'wb') as f: f.write(xSoup.prettify('utf-8'))
)