Home > Software engineering >  Concatenate duplicate named XML tags in Python/Pandas
Concatenate duplicate named XML tags in Python/Pandas

Time:06-02

Is there a way to concatenate the text from duplicate named tags?

Example xml:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor>"Austria"</neighbor>
        <neighbor>"Switzerland"</neighbor>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor>"Malaysia"</neighbor>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor>"Costa Rica"</neighbor>
        <neighbor>"Colombia"</neighbor>
    </country>
</data>

This is what I have so far:

from xml.etree import ElementTree as ET

tree = ET.parse('sample.xml')
root = tree.getroot()

for acct_det in root.iter('neighbor'):
    print(acct_det.text)

What I wanting to do is make concatenated strings from the neighbor tags:

Austria Switzerland
Malaysia
Costa Rica Colombia

I'm having trouble finding a solution to accomplish this.

CodePudding user response:

from lxml import etree 

tree = etree.parse('tmp.xml')

slist = tree.xpath('//country')
for d in slist:
    print( d.xpath('concat(./neighbor[1]/text(), " ", ./neighbor[2]/text())'))

Result

"Austria" "Switzerland"
"Malaysia" 
"Costa Rica" "Colombia"

CodePudding user response:

from xml.etree import ElementTree as ET

tree = ET.parse("sample.xml")
root = tree.getroot()

for country in root:
    neighbors = " ".join([n.text.strip('"') for n in country.findall("neighbor")])
    print(neighbors)
  • Related