Home > Software design >  How to parse information from children tag using beautifulsoup from xml file?
How to parse information from children tag using beautifulsoup from xml file?

Time:08-19

Here is the layout of the XML file that I am parsing. Whenever an instance occurs when there is a tag like driverslicense with multiple values I am trying to parse them to get the name and text. i.e. {number: 99999999, state: CA}

 """ >  <subjects>

        <subject id="B6">

            <name type="primary">

                <first>Frank </first>

                <middle></middle>

                <last>Darko</last>

            </name>

            <birthdate>10/26/2001</birthdate>

            <age>17</age>

            <ssn>12345679</ssn>

            <description>

                <sex>Male</sex>

            </description>

            <address type="residence" ref="A1"/>

            <driverslicense state="CA" number="99999999"/>

        </subject>

    </subjects>"""

My code is as follows:

dl = bs_data.find("driverslicense")

Output:

<driverslicense number="T64430698" state="VA"/>

I tried do a for loop but then no value is returned as well as .text but this also returns none.

for i in bs_data.find('driverslicense'):
print(i)
------------------
DriverLicense = bs_data.find("driverslicense")
print(DriverLicense.text)

I prefer to get this in dictionary form but if I get this as independent variables like state = CA and number = 99999999 that would work as well.

CodePudding user response:

This is one way you can get that info:

from bs4 import BeautifulSoup
html = '''
<subjects>

        <subject id="B6">

            <name type="primary">

                <first>Frank </first>

                <middle></middle>

                <last>Darko</last>

            </name>

            <birthdate>10/26/2001</birthdate>

            <age>17</age>

            <ssn>12345679</ssn>

            <description>

                <sex>Male</sex>

            </description>

            <address type="residence" ref="A1"/>

            <driverslicense state="CA" number="99999999"/>

        </subject>

    </subjects>
'''
dict_with_stuff = {}
soup = BeautifulSoup(html, 'html.parser')
dict_with_stuff['state'] = soup.select_one('driverslicense').get('state')
dict_with_stuff['number'] = soup.select_one('driverslicense').get('number')
print(dict_with_stuff)

Result:

{'state': 'CA', 'number': '99999999'}

BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/index.html

  • Related