Home > Back-end >  How to find a tag by specific text in Python BeautifulSoup?
How to find a tag by specific text in Python BeautifulSoup?

Time:09-06

Do you know how to search for specific text inside pythons BeautifulSoup, to find tags - better full path to tags, containing some string?

The common way of BS4 usage is for example:

import requests
from bs4 import BeautifulSoup

url = "https://elementy.ru/novosti_nauki"


website = requests.get(url)
results = BeautifulSoup(website.content, 'html.parser')

and then you can query for all tags with some properties, like header, class, etc.

However I want to go different way, and find the location of the specific text inside this structure?

If you do it with plain HTML text it is really unconvenient.

CodePudding user response:

You could use a css selector or more exactly the Soup Sieve pseudo class :-soup-contains() to search for tags that contains a certain string:

soup.select(':-soup-contains("that")')

or as alternative re.compile():

import re
soup('p', text=re.compile('that')))

Note: If you use soup.find_all(string="that") this wont work, cause it expects to match the exact string.

Example

from bs4 import BeautifulSoup

html = '''
<p>some content</p>
<p>pattern that we like</p>
<p>some content</p>
'''
soup = BeautifulSoup(html, 'html.parser')

soup.select(':-soup-contains("that")')

Output

[<p>pattern that we like</p>]
  • Related