my bs4.element.ResultSet has this format:
[<h3 >
<a href="someLink" title="someTitle">SomeTitle</a>
</h3>,
<h3 >
<a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>]
and i want to be able to extract and save in tuple [(title,href),(title2, href2)] but i cant seem to do so
my closest attempt was
link = soup.find('h3',class_='foo1').find('a').get('title')
print(link)
but that only returns the first element of the 2 or more how can i successfully extract each href and title
CodePudding user response:
Select your elements more specific e.g. with css selectors
and iterate over your ResultSet
to get the attributes of each of them as list of tuples
:
[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href][title]')]
Example
from bs4 import BeautifulSoup
html = '''
<h3 >
<a href="someLink" title="someTitle">SomeTitle</a>
</h3>
<h3 >
<a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>
'''
soup = BeautifulSoup(html)
[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href]')]
Output
[('someTitle', 'someLink'), ('OtherTitle', 'OtherLink')]
CodePudding user response:
Code:
soup.select('h3.foo1>a[href][title]').map(lambda link : (link.get("href"), link.get("title")))
Explanation:
soup.select('h3.foo1>a[href][title]')
Selects all the a
elements that have a href
and a title
that are a direct child of an h3
element with the foo1
class.
.map(lambda link :
For each of those a
elements, replace each of them with what follows.
(link.get("href"), link.get("title"))
Make a tuple containing the link's href
and title
.