I am using beautifulsoup to extract html data. I need to extract the html tags along with the data if data is given as search item provided the tag can be anything.
As a sample considering the following html code
<h1>Hello</h1>
<h1>Python Program</h1>
<span class = true>Geeks</span>
<span class = false>Geeks New</span>
<li class = 1 >Python Program</li>
<li class = 2 >Python Code</li>
<li class = 3 >Hello</li>
<table>
<tr>Website</tr>
</table>
Using the following code if tag is known, then the entire tag with data is available
pattern = 'Hello'
text1 = soup.find_all('li', text = pattern)
print(text1)
This will give the
[<li class = 3 >Hello</li>]
But if I give 'Hello' as search item I need to get all the tags which contain 'Hello' like
[<h1>Hello</h1>, <li class = 3 >Hello</li>]
CodePudding user response:
You can use boolean instead of li tag
html = '''
<h1>Hello</h1>
<h1>Python Program</h1>
<span class = true>Geeks</span>
<span class = false>Geeks New</span>
<li class = 1 >Python Program</li>
<li class = 2 >Python Code</li>
<li class = 3 >Hello</li>
<table>
<tr>Website</tr>
</table>
'''
pattern = 'Hello'
soup = BeautifulSoup(html, "html.parser")
text1 = soup.find_all(True, text = pattern)
print(text1)
Output:
[<h1>Hello</h1>, <li >Hello</li>]
CodePudding user response:
You could use a css selector
that checks if an element contains a string:
soup.select(':-soup-contains("Hello")')
Example
from bs4 import BeautifulSoup
html ='''
<h1>Hello</h1>
<h1>Python Program</h1>
<span class = true>Geeks</span>
<span class = false>Geeks New</span>
<li class = 1 >Python Program</li>
<li class = 2 >Python Code</li>
<li class = 3 >Hello</li>
<table>
<tr>Website</tr>
</table>
'''
pattern = 'Hello'
soup = BeautifulSoup(html, 'html.parser')
soup.select(f':-soup-contains({pattern})')
Output
[<h1>Hello</h1>, <li >Hello</li>]