How can I get the string value of a XML element named "name" with BeatifulSoup4?-CodePudding

When I try to parse a XML element (tag) "Name" with BeatifulSoup4

exemplary_xml = '''
<SomeTag>
    <UsualTag>abc</UsualTag>
    <Name>xyz</Name>
</SomeTag>
'''

soup = BeautifulSoup(exemplary_xml, parser="xml")
print(soup.sometag.usualtag.string)
print(soup.sometag.name.string)

I'm getting an error cause it conflicts with the API .name for accessing the tags name:

abc
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [16], in <module>
      8 soup = BeautifulSoup(exemplary_xml, parser="lxml")
      9 print(soup.sometag.usualtag.string)
---> 10 print(soup.sometag.name.string)

AttributeError: 'str' object has no attribute 'string'

How can I get the string value of a XML element/tag named "name"?

CodePudding user response：

The way you're using the xml via bs4 is odd and deprecated. Use features and then either find() or find_all().

For example:

from bs4 import BeautifulSoup

exemplary_xml = '''
<SomeTag>
    <UsualTag>abc</UsualTag>
    <Name>xyz</Name>
</SomeTag>
'''

soup = BeautifulSoup(exemplary_xml, features="xml")
print(soup.find("UsualTag").string)
print(soup.find("Name").string)

Output:

abc
xyz

CodePudding user response：

Right now I use a workaround: soup.sometag.find("name").string. Performance-wise this is not optimal. Probably there is some better way of identifying the XML element.