Printing XML Files Results using if with Python-CodePudding

I have this xml file, and I am trying to print all the countries that have rank == 2. Also, I am trying to print all the countries with neighbor == E.

<?xml version="1.0"?>
<data>
    <country>
       <countryname>Canada</countryname>
       <rank>2</rank>
       <year>2008</year>
       <gdppc>141100</gdppc>
       <neighbor> E</neighbor>       
    </country>
    <country>
       <countryname>USA</countryname> 
       <rank>1</rank>
       <year>2010</year>
       <gdppc>121100</gdppc>
       <neighbor> A</neighbor>       
    </country>
    <country>
       <countryname>Mexico</countryname>
       <rank>2</rank>
       <year>2011</year>
       <gdppc>131100</gdppc>
       <neighbor>E</neighbor>       
    </country>
    <country>
       <countryname>France</countryname>
       <rank>1</rank>
       <year>2018</year>
       <gdppc>191100</gdppc>
       <neighbor> A</neighbor>       
    </country>
    <country>
       <countryname>Italy</countryname>
       <rank>2</rank>
       <year>2020</year>
       <gdppc>181100</gdppc>
       <neighbor> E</neighbor>       
    </country>
</data>

The "If Statement" I have tried so far:

for country in root.findall('country'):
     rank = int(country.find('rank').text)
     if rank == 2:
        print(rank)

for country in root.findall('country'):
     neighbor = text(country.find('neighbor').text)
     if neighbor == E:
         print(neighbor)

But I am getting this error:

if rank ==4:
IndentationError: unexpected indent

I don't know how to print the results for my "If Statement", please help.

Thank you!!

CodePudding user response：

If you already decided to use .findall() you can pass XPath expression which will return you <countryname> node which belongs to <country> nodes with certain text (2 in this case) in <rank> sub node.

import xml.etree.ElementTree as ET

# root initialization
for element in root.findall("./country[rank='2']/countryname"):
    print(element.text)

You can find more information about XPath support in xml.etree.ElementTree module there. Note that it supports only basic abbreviated syntax. If you need to use extended functionality of XPath take a look on lxml.

For example, standard ElementTree functionality is not enough to solve same task basing on <neighbor> node, because it could contain leading space. Using lxml we can solve this:

from lxml import etree

# root initialization
print(*root.xpath("./country[contains(neighbor, 'E')]/countryname/text()"), sep="\n")

CodePudding user response：

I am trying to print all the countries that have rank == 2. Also, I am trying to print all the countries with neighbor == E

Using ElementTree the code below print the country names where neighbor == E or rank == 2

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0"?>
<data>
    <country>
       <countryname>Canada</countryname>
       <rank>2</rank>
       <year>2008</year>
       <gdppc>141100</gdppc>
       <neighbor> E</neighbor>       
    </country>
    <country>
       <countryname>USA</countryname> 
       <rank>1</rank>
       <year>2010</year>
       <gdppc>121100</gdppc>
       <neighbor> A</neighbor>       
    </country>
    <country>
       <countryname>Mexico</countryname>
       <rank>2</rank>
       <year>2011</year>
       <gdppc>131100</gdppc>
       <neighbor>E</neighbor>       
    </country>
    <country>
       <countryname>France</countryname>
       <rank>1</rank>
       <year>2018</year>
       <gdppc>191100</gdppc>
       <neighbor> A</neighbor>       
    </country>
    <country>
       <countryname>Italy</countryname>
       <rank>2</rank>
       <year>2020</year>
       <gdppc>181100</gdppc>
       <neighbor> E</neighbor>       
    </country>
</data>'''

root = ET.fromstring(xml)
country_lst = [c.find('countryname').text for c in root.findall('.//country') if c.find('neighbor').text.strip() == 'E' or c.find('rank').text == '2']
print(country_lst)

output

['Canada', 'Mexico', 'Italy']