Home > other >  in BeautifulSoup how to split the soup by certain words?
in BeautifulSoup how to split the soup by certain words?

Time:10-25

I'm using bs4 to scrap a document which has a format like this, and only want all the a tag elements above text2. How can I do so?

<h1>text1</h1>
<a href="link">link</a>

<h1>text2</h1>
<a href="link"></a>

If I turn soup into string and split, not sure I can turn it back to soup and I need to use the soup.find_all('a') afterwards.

CodePudding user response:

try with soup.find_all_previous()

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<h1>text1</h1>
<a href="link">link</a>

<h1>text2</h1>
<a href="link"></a>""", "html.parser")

print(soup.find("h1", text="text2").find_all_previous())

[<a href="link">link</a>, <h1>text1</h1>]
  • Related