How to find href by title using select()?-CodePudding

I need to find href for page which have title that I given. For example I have title Italy, and on wikipedia I want to get href for page,which have this title. This is my code:

                if status_code_finish == 200:
                    list_of_titles_finish = [title.get('title') for title in
                                             soup_finish.find(f'title*="{finish}"]')]

arg finish is Italy

How can I do like:

title.get('title') for title in soup.finish.find(f'title= {finish}')

CodePudding user response：

You could use css selectors to select your elements more specific and in a single statement, without concatenating several find() or find_all() - Simply use the attribute selector and * for contains:

pattern = 'Italy'    
[a.get('title') for a in soup.select(f'a[title*="{pattern}"]')]

or with a list of pattern:

pattern = ['Italy','Finland']
set(a.get('title') for p in pattern for a in soup.select(f'a[title*="{p}"]'))

Example

from bs4 import BeautifulSoup
html = '''
<a href="/wiki/Italy" title="Italy">Italy</a>
<a href="/wiki/Italy" title="Italy Finland">Italy Finland</a>
<a href="/wiki/Finland" title="Finland">Finland</a>
'''
soup = BeautifulSoup(html)

pattern = 'Italy'    
[a.get('title') for a in soup.select(f'a[title*="{pattern}"]')]

CodePudding user response：

You can use soup.find_all() with the constraint of title=finish to make a list of finish title. After that, you could just iterate through it.

CodePudding user response：

To find the title of a webpage using the soup.select method in Python, you can use the following code:

from bs4 import BeautifulSoup
import requests

url = 'https://www.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

title = soup.select('title')[0].get_text()
print(title)```

This code first imports the necessary modules, then it sends a GET request to the specified URL using the requests module. The response is then parsed using the BeautifulSoup module, and the title is selected using the soup.select method. The [0] at the end is used to select the first element in the list returned by soup.select. The get_text() method is then used to extract the text within the title tag.

Note that the above code assumes that there is only one title tag in the HTML document, if there is more than one title tags you need to loop through the title tags or you can select the specific tag by adding class or id or any other attributes to the select method.