[<div ><ol>
<li><a href="https://www.geeksforgeeks.org/array-rotation/">Program for array rotation</a><li></ol></div>]
In the above <class 'bs4.element.ResultSet'>
, I want to extract the text Program for array rotation
and the link "https://www.geeksforgeeks.org/array-rotation/"
How can I do that using Python?
CodePudding user response:
If there is only a single link you like to get extracted you could use:
link = soup.select_one('li a[href]')['href']
text = soup.select_one('li a[href]').text
print(link, text)
But to go more generic, you could select all the <a>
and than iterat the ResultSet
with a dict comprehension
to get unique href
or text
values, so also working for single items:
html = '''
<div ><ol>
<li><a href="https://www.geeksforgeeks.org/array-rotation/">Program for array rotation1</a><li>
<li><a href="https://www.geeksforgeeks.org/array-rotation/">Program for array rotation2</a><li></ol></div>
'''
soup = BeautifulSoup(html)
{a['href']:a.text for a in soup.select('div.rotation li a[href]')}
Out:
{'https://www.geeksforgeeks.org/array-rotation/': 'Program for array rotation2'}
or with list comprehension
to get all variations:
[{a['href']:a.text} for a in soup.select('div.rotation li a[href]')]
Out:
[{'https://www.geeksforgeeks.org/array-rotation/': 'Program for array rotation1'},
{'https://www.geeksforgeeks.org/array-rotation/': 'Program for array rotation2'}]