I'm trying to extract the values from a xml file and save it as a dataframe. For each line element, I'd like to add the date from the chk element.
<?xml version="1.0" encoding="ISO-8859-1"?>
<sales>
<chk no="xxx" date="xxxx" time="xxx" total="xxxx" debtor="xxxx" name="xxx" cardnumber="xxxxxxx" mobil="" >
<line productId="xxxx" product="xxxx" productGroupId="xxx" productGroup="xxx" amount="x" price="xxx" />
<line productId="xxx" product="xxx" productGroupId="xxx" productGroup="xxx" amount="xx" price="xxxx" />
</chk>
<chk no="xxx" date="xxxx" time="xx" total="xxxx" debtor="xxxx" name="xxxx" cardnumber="xxxx" mobil="xxxxx" >
<line productId="xxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxx" amount="xxxx" price="xxxx" />
<line productId="xxxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxxx" amount="xxx" price="xxxxx" />
</chk>
</sales>
root = ET.fromstring(response.content)
sales = []
for date in root.iter('chk'):
sales.append(date.attrib)
lines = []
for line in root.iter('line'):
lines.append(line.attrib)
I am able to extract the chk and line element separately. How can I append the date to the lines list?
CodePudding user response:
Iterate over lines inside the chk iteration and use date i/o root as a iteration object. Something like that
root = ET.fromstring(resp)
for date in root.iter('chk'):
for line in date.iter('line'):
print(date.attrib,line.attrib)
CodePudding user response:
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="ISO-8859-1"?>
<sales>
<chk no="xxx" date="xxxx" time="xxx" total="xxxx" debtor="xxxx" name="xxx" cardnumber="xxxxxxx" mobil="" >
<line productId="xxxx" product="xxxx" productGroupId="xxx" productGroup="xxx" amount="x" price="xxx" />
<line productId="xxx" product="xxx" productGroupId="xxx" productGroup="xxx" amount="xx" price="xxxx" />
</chk>
<chk no="xxx" date="zzzz" time="xx" total="xxxx" debtor="xxxx" name="xxxx" cardnumber="xxxx" mobil="xxxxx" >
<line productId="xxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxx" amount="xxxx" price="xxxx" />
<line productId="xxxxx" product="xxxxx" productGroupId="xxxx" productGroup="xxxx" amount="xxx" price="xxxxx" />
</chk>
</sales>'''
root = ET.fromstring(xml)
for chk in root.findall('.//chk'):
for line in chk.findall('line'):
line.attrib['date'] = chk.attrib['date']
ET.dump(root)