I have this xml file, and I am trying to print all the countries that have rank == 2. Also, I am trying to print all the countries with neighbor == E.
<?xml version="1.0"?>
<data>
<country>
<countryname>Canada</countryname>
<rank>2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor> E</neighbor>
</country>
<country>
<countryname>USA</countryname>
<rank>1</rank>
<year>2010</year>
<gdppc>121100</gdppc>
<neighbor> A</neighbor>
</country>
<country>
<countryname>Mexico</countryname>
<rank>2</rank>
<year>2011</year>
<gdppc>131100</gdppc>
<neighbor>E</neighbor>
</country>
<country>
<countryname>France</countryname>
<rank>1</rank>
<year>2018</year>
<gdppc>191100</gdppc>
<neighbor> A</neighbor>
</country>
<country>
<countryname>Italy</countryname>
<rank>2</rank>
<year>2020</year>
<gdppc>181100</gdppc>
<neighbor> E</neighbor>
</country>
</data>
The "If Statement" I have tried so far:
for country in root.findall('country'):
rank = int(country.find('rank').text)
if rank == 2:
print(rank)
for country in root.findall('country'):
neighbor = text(country.find('neighbor').text)
if neighbor == E:
print(neighbor)
But I am getting this error:
if rank ==4:
IndentationError: unexpected indent
I don't know how to print the results for my "If Statement", please help.
Thank you!!
CodePudding user response:
If you already decided to use .findall()
you can pass XPath expression which will return you <countryname>
node which belongs to <country>
nodes with certain text (2
in this case) in <rank>
sub node.
import xml.etree.ElementTree as ET
# root initialization
for element in root.findall("./country[rank='2']/countryname"):
print(element.text)
You can find more information about XPath support in xml.etree.ElementTree
module there. Note that it supports only basic abbreviated syntax. If you need to use extended functionality of XPath take a look on lxml
.
For example, standard ElementTree
functionality is not enough to solve same task basing on <neighbor>
node, because it could contain leading space. Using lxml
we can solve this:
from lxml import etree
# root initialization
print(*root.xpath("./country[contains(neighbor, 'E')]/countryname/text()"), sep="\n")
CodePudding user response:
I am trying to print all the countries that have
rank == 2
. Also, I am trying to print all the countries withneighbor == E
Using ElementTree
the code below print the country names where neighbor == E or rank == 2
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0"?>
<data>
<country>
<countryname>Canada</countryname>
<rank>2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor> E</neighbor>
</country>
<country>
<countryname>USA</countryname>
<rank>1</rank>
<year>2010</year>
<gdppc>121100</gdppc>
<neighbor> A</neighbor>
</country>
<country>
<countryname>Mexico</countryname>
<rank>2</rank>
<year>2011</year>
<gdppc>131100</gdppc>
<neighbor>E</neighbor>
</country>
<country>
<countryname>France</countryname>
<rank>1</rank>
<year>2018</year>
<gdppc>191100</gdppc>
<neighbor> A</neighbor>
</country>
<country>
<countryname>Italy</countryname>
<rank>2</rank>
<year>2020</year>
<gdppc>181100</gdppc>
<neighbor> E</neighbor>
</country>
</data>'''
root = ET.fromstring(xml)
country_lst = [c.find('countryname').text for c in root.findall('.//country') if c.find('neighbor').text.strip() == 'E' or c.find('rank').text == '2']
print(country_lst)
output
['Canada', 'Mexico', 'Italy']