How can i extract Href and title from this HTML-CodePudding

my bs4.element.ResultSet has this format:

    [<h3 >
    <a href="someLink" title="someTitle">SomeTitle</a>
    </h3>,
    <h3 >
    <a href="OtherLink" title="OtherTitle">OtherTitle</a>
    </h3>]

and i want to be able to extract and save in tuple [(title,href),(title2, href2)] but i cant seem to do so

my closest attempt was

    link = soup.find('h3',class_='foo1').find('a').get('title')
    print(link)

but that only returns the first element of the 2 or more how can i successfully extract each href and title

CodePudding user response：

Select your elements more specific e.g. with css selectors and iterate over your ResultSet to get the attributes of each of them as list of tuples:

[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href][title]')]

Example

from bs4 import BeautifulSoup
html = '''
<h3 >
    <a href="someLink" title="someTitle">SomeTitle</a>
</h3>
<h3 >
    <a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>
'''
soup = BeautifulSoup(html)

[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href]')]

Output

[('someTitle', 'someLink'), ('OtherTitle', 'OtherLink')]

CodePudding user response：

Code:

soup.select('h3.foo1>a[href][title]').map(lambda link : (link.get("href"), link.get("title")))

Explanation:

soup.select('h3.foo1>a[href][title]')

Selects all the a elements that have a href and a title that are a direct child of an h3 element with the foo1 class.

.map(lambda link :

For each of those a elements, replace each of them with what follows.

(link.get("href"), link.get("title"))

Make a tuple containing the link's href and title.