Home > Net >  python parsing with bs4 an xml to get a list of elements
python parsing with bs4 an xml to get a list of elements

Time:02-10

I have an xml file containing data in this form:

<head xml:id="_2ebf9c0003">\n\nTECHNICAL FIELD</head>\n
<p n="0001" xml:id="_2ebf9c0004">whatever</p>
<p n="0002" xml:id="_2ebf9c0004">whatever</p>
<... other tags and data...>
<head xml:id="_2ebf9c0003">\n\nTITLE</head>\n

I know how to get particular elements like:

from bs4 import BeautifulSoup
soup = BeautifulSoup(PDM_description, 'lxml')
title_element = soup.title$
importing all p elements
paras = soup.findAll('p')

The question is how can I add an OR inside the query to get a list of "p" or "head" elements? More general how to get all the elements with tags belonging to a list.

PSEUDO CODE:

paras = soup.findAll('p' OR 'head')    

CodePudding user response:

You are close to your goal, just add a list with tags to your find_all():

soup.find_all(['p','head'])

Note: In new code use find_all() instead of older findAll() syntax

CodePudding user response:

You can use the , CSS selector, define your tags separated by a comma (,). To use a CSS selector, use the .select() method:

print(
    soup.select("p, head")
)
  • Related