Home > Software design >  Is there a way to loop through an xml file and change tags to make them unique?
Is there a way to loop through an xml file and change tags to make them unique?

Time:12-19

In my XML file [studentinfo.xml] is there a way to modify my loop. The current one that I have [see below] works but not to my liking. What happens is that when it loops through the file to make the specified tag/tags unique it stops at 0 [see current issue]. It is important that the loop starts over at the end of each Student Screening [ stu:StudentScreening] and once it gets to the last student screening the loop should stop.

There will be multiple tags that will need to be unique, which is why my code [see below] is setup like that. What am I doing wrong and what is the fix? [open to any and all options]

**note: the file is very large

#The Issue:

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
    <stu:StudentScreening>
        <st:adminid>123321</st:adminid>
        <st:namelists>
          **<sti:name0>Sam Davies</sti:name0>**
        </st:namelists>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:namelists>
          **<sti:name0>Master Splinter</sti:name0>
          <sti:name0>Peter Griffin</sti:name0>
          <sti:name0>Louis Griffin</sti:name0>**
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Male</ie:gender>
            <ie:height>6'1</ie:height>
            <ie:weight>185</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT12</dep:departmentid>
            <dep:departmentname>Aerospace Engineering</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Mar</ie:month>
            <ie:day>05</ie:day>
            <ie:year>2007</ie:year>
        </st:DateofBirth>
        <st:Roommate>
          <st:namelists>
            **<sti:name0>Tony Tiger</sti:name0>**
          </st:namelists>
        </st:Roommate>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:adminid>456654</st:adminid>
        <st:namelists>
          <sti:name0>Cassie Stone</sti:name0>
        </st:namelists>
        <st:age>14</st:age>
        <st:hair>Brown</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:namelists>
          <sti:name0>Kelly Clarkson</sti:name0>
          <sti:name0>Stewie Griffin</sti:name0>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Female</ie:gender>
            <ie:height>5'5</ie:height>
            <ie:weight>150</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT24</dep:departmentid>
            <dep:departmentname>Earth Sciences</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Apr</ie:month>
            <ie:day>24</ie:day>
            <ie:year>2006</ie:year>
        </st:DateofBirth>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:adminid>789987</st:adminid>
        <st:namelists>
          <sti:name0>Derek Brandon</sti:name0>
        </st:namelists>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:namelists>
          <sti:name0>Miley Cyrus</sti:name0>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Male</ie:gender>
            <ie:height>6'5</ie:height>
            <ie:weight>198</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT16</dep:departmentid>
            <dep:departmentname>Mechanical Engineering</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Jan</ie:month>
            <ie:day>10</ie:day>
            <ie:year>2005</ie:year>
        </st:DateofBirth>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

Each tag should be unique for each Student Screening [stu:StudentScreening] and I want to make them unique by adding a number on the end.

#Desired output:

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
    <stu:StudentScreening>
        <st:adminid>123321</st:adminid>
        <st:namelists>
          <sti:name0>Sam Davies</sti:name0>
        </st:namelists>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:namelists>
          <sti:name1>Master Splinter</sti:name1>
          <sti:name2>Peter Griffin</sti:name2>
          <sti:name3>Louis Griffin</sti:name3>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Male</ie:gender>
            <ie:height>6'1</ie:height>
            <ie:weight>185</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT12</dep:departmentid>
            <dep:departmentname>Aerospace Engineering</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Mar</ie:month>
            <ie:day>05</ie:day>
            <ie:year>2007</ie:year>
        </st:DateofBirth>
        <st:Roommate>
          <st:namelists>
            <sti:name4>Tony Tiger</sti:name4>
          </st:namelists>
        </st:Roommate>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:adminid>456654</st:adminid>
        <st:namelists>
          <sti:name0>Cassie Stone</sti:name0>
        </st:namelists>
        <st:age>14</st:age>
        <st:hair>Brown</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:namelists>
          <sti:name1>Kelly Clarkson</sti:name1>
          <sti:name2>Stewie Griffin</sti:name2>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Female</ie:gender>
            <ie:height>5'5</ie:height>
            <ie:weight>150</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT24</dep:departmentid>
            <dep:departmentname>Earth Sciences</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Apr</ie:month>
            <ie:day>24</ie:day>
            <ie:year>2006</ie:year>
        </st:DateofBirth>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:adminid>789987</st:adminid>
        <st:namelists>
          <sti:name0>Derek Brandon</sti:name0>
        </st:namelists>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:namelists>
          <sti:name1>Miley Cyrus</sti:name1>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Male</ie:gender>
            <ie:height>6'5</ie:height>
            <ie:weight>198</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT16</dep:departmentid>
            <dep:departmentname>Mechanical Engineering</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Jan</ie:month>
            <ie:day>10</ie:day>
            <ie:year>2005</ie:year>
        </st:DateofBirth>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

#Current xml file:

<?xml version="1.0" encoding="UTF-8"?>
<stu:StudentBreakdown>
<stu:Studentdata>
    <stu:StudentScreening>
        <st:adminid>123321</st:adminid>
        <st:namelists>
          <sti:name>Sam Davies</sti:name>
        </st:namelists>
        <st:age>15</st:age>
        <st:hair>Black</st:hair>
        <st:eyes>Blue</st:eyes>
        <st:grade>10</st:grade>
        <st:teacher>Draco Malfoy</st:teacher>
        <st:dorm>Innovation Hall</st:dorm>
        <st:namelists>
          <sti:name>Master Splinter</sti:name>
          <sti:name>Peter Griffin</sti:name>
          <sti:name>Louis Griffin</sti:name>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Male</ie:gender>
            <ie:height>6'1</ie:height>
            <ie:weight>185</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT12</dep:departmentid>
            <dep:departmentname>Aerospace Engineering</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Mar</ie:month>
            <ie:day>05</ie:day>
            <ie:year>2007</ie:year>
        </st:DateofBirth>
        <st:Roommate>
          <st:namelists>
            <sti:name>Tony Tiger</sti:name>
          </st:namelists>
        </st:Roommate>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:adminid>456654</st:adminid>
        <st:namelists>
          <sti:name>Cassie Stone</sti:name>
        </st:namelists>
        <st:age>14</st:age>
        <st:hair>Brown</st:hair>
        <st:grade>9</st:grade>
        <st:teacher>Luna Lovegood</st:teacher>
        <st:namelists>
          <sti:name>Kelly Clarkson</sti:name>
          <sti:name>Stewie Griffin</sti:name>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Female</ie:gender>
            <ie:height>5'5</ie:height>
            <ie:weight>150</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT24</dep:departmentid>
            <dep:departmentname>Earth Sciences</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Apr</ie:month>
            <ie:day>24</ie:day>
            <ie:year>2006</ie:year>
        </st:DateofBirth>
    </stu:StudentScreening>
    <stu:StudentScreening>
        <st:adminid>789987</st:adminid>
        <st:namelists>
          <sti:name>Derek Brandon</sti:name>
        </st:namelists>
        <st:age>17</st:age>
        <st:eyes>green</st:eyes>
        <st:teacher>Ron Weasley</st:teacher>
        <st:dorm>Hogtie Manor</st:dorm>
        <st:namelists>
          <sti:name>Miley Cyrus</sti:name>
        </st:namelists>
        <st:status>Full Time</st:status>
        <st:description>
            <ie:gender>Male</ie:gender>
            <ie:height>6'5</ie:height>
            <ie:weight>198</ie:weight>
        </st:description>
        <st:department>
            <dep:departmentid>IDEPT16</dep:departmentid>
            <dep:departmentname>Mechanical Engineering</dep:departmentname>
        </st:department>
        <st:DateofBirth>
            <ie:month>Jan</ie:month>
            <ie:day>10</ie:day>
            <ie:year>2005</ie:year>
        </st:DateofBirth>
    </stu:StudentScreening>
</stu:Studentdata>
</stu:StudentBreakdown>

#Current code:

import pandas as pd
import re
from lxml import etree as Etr
from xml.etree import ElementTree as ET
from xml.parsers import expat
from bs4 import BeautifulSoup
import pandas_read_xml as pdx
 
with open('studentinfo.xml', 'r') as f:
    file = f.read()
soup = BeautifulSoup(file, 'lxml') 
 
enumTags = ['sti:name']
for d in [c for c in soup.descendants if c.name]:
    for name in enumTags:
        for i, t in enumerate(d.find_all(name, recursive=False)):
            t.name = f'{t.name}{i}'
print(soup.prettify)

CodePudding user response:

In XSLT, use an identity template plus

<xsl:template match="sti:name">
  <xsl:variable name="n">
    <xsl:number level="any" from="stu:StudentScreening"/>
  </xsl:variable>
  <xsl:element name="sti:name{$n}">
    <xsl:value-of select="."/>
  </xsl:element>
</xsl:template>

By the way, telling us the file is "very large" is meaningless unless you say how large. You could be talking 1Mb, you could be talking 100Gb.

CodePudding user response:

Try like this

tagListPairs = [(['stu:studentscreening'], ['sti:name'])]
for parentTags, enumTags in tagListPairs:
    for p in soup.find_all(parentTags):
        for name in enumTags:
            for i, t in enumerate(p.find_all(name)):
                t.name = f'{t.name}{i}'

tagListPairs should be in the format [(parentTags_1, enumTags_1), (parentTags_2, enumTags_2),....(parentTags_n, enumTags_n)] and both parentTags and enumTags should be a list of of tag names.

  • Related