Python: Get element next to href-CodePudding

Python code:

url = 'https://www.basketball-reference.com/players/'
initial = list(string.ascii_lowercase)
initial_url = [url   i for i in initial]
html_initial = [urllib.request.urlopen(i).read() for i in initial_url]
soup_initial = [BeautifulSoup(i, 'html.parser') for i in html_initial]
tags_initial = [i('a') for i in soup_initial]
print(tags_initial[0][50])

Results example:

<a href="/players/a/abdursh01.html">Shareef Abdur-Rahim</a>

From the example above, I want to extract the name of the players which is 'Shareef Abdur-Rahim', but I want to do it for all the tags_initial lists,

Does anyone have an idea?

CodePudding user response：

There are probably better ways but I'd do it like this:

html = "a href=\"/teams/LAL/2021.html\">Los Angeles Lakers</a"

index = html.find("a href")
index = html.find(">", index)   1
index_end = html.find("<", index)

print(html[index:index_end])

If you're using a scraper library it probably has a similar function built-in.

CodePudding user response：

Could you modify your post by adding your code so that we can help you better?

Maybe that could help you :

name = soup.findAll(YOUR_SELECTOR)[0].string

UPDATE

import re
import string
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'https://www.basketball-reference.com/players/'
# Alphabet
initial = list(string.ascii_lowercase)
datas = []
# URLS
urls = [url   i for i in initial]
for url in urls:
    # Soup Object
    soup = BeautifulSoup(urlopen(url), 'html.parser')
    # Players link
    url_links = soup.findAll("a", href=re.compile("players"))
    for link in url_links:
        # Player name
        datas.append(link.string)

print("datas : ", datas)

Then, "datas" contains all the names of the players, but I advise you to do a little processing afterwards to remove some erroneous information like "..." or perhaps duplicates