Home > Net >  BeautifulSoup unable to find XML element with special characters in name
BeautifulSoup unable to find XML element with special characters in name

Time:09-07

I'm trying to parse an XML document written by a BI software program (Tableau, specifically!). I'm using BS4 and have followed multiple other StackOverflow solutions which haven't worked for me. Hoping someone will be able to point out what I'm doing wrong.

This is my XML
<datasources>
  <datasource>
    <_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
      <objects>
        <object caption='table' id='table'>
          <properties context='extract'>
            <relation name='Extract' table='[Extract].[Extract]' type='table' />
          </properties>
        </object>
      </objects>
    </_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
  </datasource>
</datasources>

And I've cleaned up code below so I can post it here:

Parsing the tree
soup = BeautifulSoup(xmlstr, 'lxml')
print(soup.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph"))
# This works! Prints the object markup

datasources = soup.find('datasources').find_all('datasource')
for ds in datasources:
    print(ds['caption'])
    print(ds['name'])
    # This works!

    result = ds.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph")
    print(result.name)
    # This doesn't work! returns none

    for tag in ds:
        if tag.name == "_.fcp.objectmodelencapsulatelegacy.true...object-graph":
           print(tag.name)
           # This works ^^

As you can tell, the item definitely exists within the tag it's supposed to be in. Iterating the elements inside the datasource spits out the element I'm looking for & checking if name = the one I'm looking for confirms it's in there. But for some reason when I access it with find or find_all when I'm looking inside the datasource, I keep getting none returned. I thought the issue was with the name (as some StackOverflow posts suggested) but it would appear not as soup.find catches the element. So I'm at a loss, any help would be appreciated.

Thanks!

CodePudding user response:

Try the following code. It should work.

from bs4 import BeautifulSoup

xmlstr = '''
<datasources>
  <datasource>
    <_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
      <objects>
        <object caption='table' id='table'>
          <properties context='extract'>
            <relation name='Extract' table='[Extract].[Extract]' type='table' />
          </properties>
        </object>
      </objects>
    </_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
  </datasource>
</datasources>
'''
soup = BeautifulSoup(xmlstr, 'lxml')

datasources = soup.find_all('datasources')#.find_all('datasource')
for ds in datasources:
    print(ds.find('object')['caption'])
    print(ds.find('relation')['name'])
    # This works!

    result = ds.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph")
    print(result.name)

Output:

table
Extract
_.fcp.objectmodelencapsulatelegacy.true...object-graph
  • Related