I have an xml file containing data in this form:
<head xml:id="_2ebf9c0003">\n\nTECHNICAL FIELD</head>\n
<p n="0001" xml:id="_2ebf9c0004">whatever</p>
<p n="0002" xml:id="_2ebf9c0004">whatever</p>
<... other tags and data...>
<head xml:id="_2ebf9c0003">\n\nTITLE</head>\n
I know how to get particular elements like:
from bs4 import BeautifulSoup
soup = BeautifulSoup(PDM_description, 'lxml')
title_element = soup.title$
importing all p elements
paras = soup.findAll('p')
The question is how can I add an OR inside the query to get a list of "p" or "head" elements? More general how to get all the elements with tags belonging to a list.
PSEUDO CODE:
paras = soup.findAll('p' OR 'head')
CodePudding user response:
You are close to your goal, just add a list with tags to your find_all()
:
soup.find_all(['p','head'])
Note: In new code use find_all()
instead of older findAll()
syntax
CodePudding user response:
You can use the ,
CSS selector, define your tags separated by a comma (,
). To use a CSS selector, use the .select()
method:
print(
soup.select("p, head")
)