Do you know how to search for specific text inside pythons BeautifulSoup
, to find tags - better full path to tags, containing some string
?
The common way of BS4
usage is for example:
import requests
from bs4 import BeautifulSoup
url = "https://elementy.ru/novosti_nauki"
website = requests.get(url)
results = BeautifulSoup(website.content, 'html.parser')
and then you can query for all tags with some properties, like header, class, etc.
However I want to go different way, and find the location of the specific text inside this structure?
If you do it with plain HTML text it is really unconvenient.
CodePudding user response:
You could use a css selector
or more exactly the Soup Sieve pseudo class :-soup-contains()
to search for tags that contains a certain string:
soup.select(':-soup-contains("that")')
or as alternative re.compile()
:
import re
soup('p', text=re.compile('that')))
Note: If you use soup.find_all(string="that")
this wont work, cause it expects to match the exact string.
Example
from bs4 import BeautifulSoup
html = '''
<p>some content</p>
<p>pattern that we like</p>
<p>some content</p>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select(':-soup-contains("that")')
Output
[<p>pattern that we like</p>]