I want to find all elements before first <p><strong>
encountered and exit the loop once it found.
example = """This should be in before section<p>Content before</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example, 'html.parser')
for data in soup:
print(data.previous_sibling)
print(data.nextSibling.name)
if nextSibling.name == '<p><strong>':
print('found and add before content in variable')
Output variable should have:
This should be in before section<p>Content before</p>
Edit: Tried below code as well
res = []
for sibling in soup.find('p').previous_siblings:
res.append(sibling.text)
res.reverse()
res = ' '.join(res)
print(res)
It should check <p><strong>
not only <p>
and I am not sure how can I do that.
CodePudding user response:
I found the solution maybe other can find useful so posted my answer here:
example = """<span>output1</span>This should be in overview section<span>output1</span><p>output 2</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example, 'html.parser')
res = []
for sibling in soup.select_one('p:has(strong)').previous_siblings:
res.append(sibling.text)
res.reverse()
res = ' '.join(res)
print(res)
Used p:has(strong)
keywords this I got from @HedgeHog answer thank you for that and I used in my solution.
CodePudding user response:
You could also select the other way around and work with find_previous
:
e = soup.select_one('p:has(strong)')
print(e.find_previous('p').previous, e.find_previous('p'))
Example
from bs4 import BeautifulSoup
example = """This should be in before section<p>Content before</p><p><strong>First Title</strong></p>Content of first title1<p>Content of first title2</p><p><strong>Second title</strong></p><p>Content of second title</p></strong>"""
soup = BeautifulSoup(example)
e = soup.select_one('p:has(strong)')
print(e.find_previous('p').previous, e.find_previous('p'))
Output
This should be in before section <p>Content before</p>