Home > Software engineering >  How to iterate over a xml file to extract some attributes?
How to iterate over a xml file to extract some attributes?

Time:05-23

I'm trying to extract the values from a xml file and save it as a dataframe. For each line element, I'd like to add the date from the chk element.

<?xml version="1.0" encoding="ISO-8859-1"?>
<sales>
    <chk no="xxx" date="xxxx" time="xxx" total="xxxx" debtor="xxxx" name="xxx" cardnumber="xxxxxxx" mobil="" >
        <line productId="xxxx" product="xxxx" productGroupId="xxx" productGroup="xxx" amount="x" price="xxx"  />
        <line productId="xxx" product="xxx" productGroupId="xxx" productGroup="xxx" amount="xx" price="xxxx"  />
    </chk>
    <chk no="xxx" date="xxxx" time="xx" total="xxxx" debtor="xxxx" name="xxxx" cardnumber="xxxx" mobil="xxxxx" >
        <line productId="xxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxx" amount="xxxx" price="xxxx"  />
        <line productId="xxxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxxx" amount="xxx" price="xxxxx"  />
    </chk>
</sales>

root = ET.fromstring(response.content)

sales = []
for date in root.iter('chk'):
    sales.append(date.attrib)

lines = []
for line in root.iter('line'):
    lines.append(line.attrib)

I am able to extract the chk and line element separately. How can I append the date to the lines list?

CodePudding user response:

Iterate over lines inside the chk iteration and use date i/o root as a iteration object. Something like that

root = ET.fromstring(resp)

for date in root.iter('chk'):
    for line in date.iter('line'):
        print(date.attrib,line.attrib)

CodePudding user response:

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="ISO-8859-1"?>
<sales>
    <chk no="xxx" date="xxxx" time="xxx" total="xxxx" debtor="xxxx" name="xxx" cardnumber="xxxxxxx" mobil="" >
        <line productId="xxxx" product="xxxx" productGroupId="xxx" productGroup="xxx" amount="x" price="xxx"  />
        <line productId="xxx" product="xxx" productGroupId="xxx" productGroup="xxx" amount="xx" price="xxxx"  />
    </chk>
    <chk no="xxx" date="zzzz" time="xx" total="xxxx" debtor="xxxx" name="xxxx" cardnumber="xxxx" mobil="xxxxx" >
        <line productId="xxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxx" amount="xxxx" price="xxxx"  />
        <line productId="xxxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxxx" amount="xxx" price="xxxxx"  />
    </chk>
</sales>'''



root = ET.fromstring(xml)
for chk in root.findall('.//chk'):
  for line in chk.findall('line'):
    line.attrib['date'] = chk.attrib['date']
ET.dump(root)
  
  • Related