Home > Software design >  How to get the value of a tag with multiple similar siblings?
How to get the value of a tag with multiple similar siblings?

Time:08-19

I am using XML, BeuatifulSoup, and Python to parse data. In this specific XML document, there are multiple children that have different tag names with different values, but all have the same child name.

I have attached an image of the directory and how the layout exists. I am trying to get the value for Occupation, MIBCarrierCode, MIBTestIndicator, and so on.

Schema of Parent, Child and Children directory

with open("arc45.xml", 'r') as file:
data = file.read()

Bs_data = BeautifulSoup(data, "xml")

Occupation = Bs_data.find_all("Name")
print(Occupation)

Output:

[-Name-Occupation-/Name-, -Name-CarrierCode-/Name-, -Name-TestIndicator-/Name-, -Name-LineOfBusinessCode-/Name-]

This only gives me the first tag "Name" but I need to grab the value and have it equal to an Occupation variable.

If I say value I receive this output:

Occupation = Bs_data.find_all("Value")
    print(Occupation)

Output:

[-Value-Unknown-/Value-, -Value-111-/Value-, -Value-0-/Value-, -Value-1-/Value-]

I need to grab the value when the tag is Occupation or CarrierCode, and so on.

This is an example of the layout of the XML file.

-AdditonalAttributes- -Attribute- -Name- Occupation -Name- -Value- Unknown -Value- / -Attribute- -Attribute- -Name- CarrierCode -Name- -Value- 656 -Value-
  • All - symbols should be replaced with >, for the sake of showing the XML format without the symbols disappearing.

Just not quite sure how to parse this information.

CodePudding user response:

You could select() or find_all() of the <Attribute> and check its <Name> against a whitelist or what ever you need to, while iterating the ResultSet:

for a in soup.select('Attribute'):
    if a.Name.get_text(strip=True) in ['Occupation','CarrierCode']:
        print(a.Value.get_text(strip=True))
Example
from bs4 import BeautifulSoup
xml='''
<AdditonalAttributes>
    <Attribute>
        <Name> Occupation </Name> 
        <Value> Unknown </Value>
    </Attribute>
    <Attribute>
        <Name> CarrierCode </Name>
        <Value> 656 </Value>
    </Attribute>
<AdditonalAttributes>
'''
soup = BeautifulSoup(xml, 'xml')

for a in soup.select('Attribute'):
    if a.Name.get_text(strip=True) in ['Occupation','CarrierCode']:
        print(a.Value.get_text(strip=True))

Output

Unknown
656
  • Related