Here is the layout of the XML file that I am parsing. Whenever an instance occurs when there is a tag like driverslicense with multiple values I am trying to parse them to get the name and text. i.e. {number: 99999999, state: CA}
""" > <subjects>
<subject id="B6">
<name type="primary">
<first>Frank </first>
<middle></middle>
<last>Darko</last>
</name>
<birthdate>10/26/2001</birthdate>
<age>17</age>
<ssn>12345679</ssn>
<description>
<sex>Male</sex>
</description>
<address type="residence" ref="A1"/>
<driverslicense state="CA" number="99999999"/>
</subject>
</subjects>"""
My code is as follows:
dl = bs_data.find("driverslicense")
Output:
<driverslicense number="T64430698" state="VA"/>
I tried do a for loop but then no value is returned as well as .text but this also returns none.
for i in bs_data.find('driverslicense'):
print(i)
------------------
DriverLicense = bs_data.find("driverslicense")
print(DriverLicense.text)
I prefer to get this in dictionary form but if I get this as independent variables like state = CA and number = 99999999 that would work as well.
CodePudding user response:
This is one way you can get that info:
from bs4 import BeautifulSoup
html = '''
<subjects>
<subject id="B6">
<name type="primary">
<first>Frank </first>
<middle></middle>
<last>Darko</last>
</name>
<birthdate>10/26/2001</birthdate>
<age>17</age>
<ssn>12345679</ssn>
<description>
<sex>Male</sex>
</description>
<address type="residence" ref="A1"/>
<driverslicense state="CA" number="99999999"/>
</subject>
</subjects>
'''
dict_with_stuff = {}
soup = BeautifulSoup(html, 'html.parser')
dict_with_stuff['state'] = soup.select_one('driverslicense').get('state')
dict_with_stuff['number'] = soup.select_one('driverslicense').get('number')
print(dict_with_stuff)
Result:
{'state': 'CA', 'number': '99999999'}
BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/index.html