Home > Enterprise >  How to get all elements before <p><strong> tag using BeautifulSoup?
How to get all elements before <p><strong> tag using BeautifulSoup?

Time:11-16

I want to find all elements before first <p><strong> encountered and exit the loop once it found.

example = """This should be in before section<p>Content before</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example, 'html.parser')

for data in soup:
    print(data.previous_sibling)
    print(data.nextSibling.name)
    if nextSibling.name == '<p><strong>':
       print('found and add before content in variable')

Output variable should have:

This should be in before section<p>Content before</p>

Edit: Tried below code as well

res = []
for sibling in soup.find('p').previous_siblings:
    res.append(sibling.text)
    
res.reverse()
res = ' '.join(res)

print(res)

It should check <p><strong> not only <p> and I am not sure how can I do that.

CodePudding user response:

I found the solution maybe other can find useful so posted my answer here:

example = """<span>output1</span>This should be in overview section<span>output1</span><p>output 2</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example, 'html.parser')

res = []
for sibling in soup.select_one('p:has(strong)').previous_siblings:
    res.append(sibling.text)
    
res.reverse()
res = ' '.join(res)

print(res)

Used p:has(strong) keywords this I got from @HedgeHog answer thank you for that and I used in my solution.

CodePudding user response:

You could also select the other way around and work with find_previous:

e = soup.select_one('p:has(strong)')
print(e.find_previous('p').previous, e.find_previous('p'))

Example

from bs4 import BeautifulSoup

example = """This should be in before section<p>Content before</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example)
    
e = soup.select_one('p:has(strong)')
print(e.find_previous('p').previous, e.find_previous('p'))

Output

This should be in before section <p>Content before</p>
  • Related