any help is appreciated! Using the example XML file below, I'm getting the incorrect output.
Incorrect output:
Emp_F_Name: Jill
Emp_M_Name: H
Emp_L_Name: Jones
Desired output:
Emp_F_Name: Jill
Emp_M_Name: None or NULL
Emp_L_Name: Jones
I'm not sure why the find_next function is going outside the declared attribute (employee).
<?xml version="1.0" encoding="utf-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>
Here's the code I'm using.
employee = soup.find("employee")
for i in employee.find_all('name'):
fname = employee.find('given')
print("Emp_F_Name: ", fname.get_text())
mname = fname.find_next('given')
print("Emp_M_Name: ", mname.get_text())
lname = employee.find('family')
print("Emp_L_Name: ", lname.get_text())
When I run the same code but for the manager, it seem to work.
manager = soup.find("manager")
CodePudding user response:
If the structure is almost identical, you can try to 'find_all()' all elements of given
and check if there is only one or two.
given= i.find_all('given')
fname = given[0]
print("Emp_F_Name: ", fname.get_text())
mname = given[1].get_text() if len(given) > 1 else None
print("Emp_M_Name: ", mname)
Think there is no need to iterate over employee
but if so, you should use your i
Example
import requests
from bs4 import BeautifulSoup
xml='''<?xml version="1.0" encoding="utf-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>'''
soup = BeautifulSoup(xml, 'lxml')
employee = soup.find("employee")
for i in employee.find_all('name'):
given= i.find_all('given')
fname = given[0]
print("Emp_F_Name: ", fname.get_text())
mname = given[1].get_text() if len(given) > 1 else None
print("Emp_M_Name: ", mname)
lname = i.find('family')
print("Emp_L_Name: ", lname.get_text())
Output
Emp_F_Name: Jill
Emp_M_Name: None
Emp_L_Name: Jones
Alternativ
Isolate employee
as separat tree to operate with find_next()
:
employee = BeautifulSoup(str(soup.find("employee")), 'lxml')
for i in employee.find_all('name'):
fname = i.find('given')
print("Emp_F_Name: ", fname.get_text())
mname = fname.find_next('given').get_text() if fname.find_next('given') else None
print("Emp_M_Name: ", mname)
lname = i.find('family')
print("Emp_L_Name: ", lname.get_text())
CodePudding user response:
Using XML parser:(no need for any external library)
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<org value="Tech">
<employee>
<name>
<family>Jones</family>
<given>Jill</given>
</name>
</employee>
<manager>
<name>
<family>Fisher</family>
<given>Junior</given>
<given>H</given>
</name>
</manager>
</org>'''
attrs = {'Emp_F_Name':'given',
'Emp_L_Name':'family',
'Emp_M_Name': None}
root = ET.fromstring(xml)
name = root.find('.//name')
for k,v in attrs.items():
print(f'{k}: {name.find(v).text if v else None}')
output
Emp_F_Name: Jill
Emp_L_Name: Jones
Emp_M_Name: None