I am trying to grab a rankings history weblink from one url by using the following scrapping code
import requests
from bs4 import BeautifulSoup
url = 'https://247sports.com/Player/Trevor-Lawrence-61350/college-212444/'
pageTree = requests.get(url, headers=headers)
Soup = BeautifulSoup(pageTree.content, 'html.parser')
past_link = Soup.find_all('ul', {'class':'ranks-list'})
past_link
I was able to generate this output
[<ul >
<li>
<b>Natl.</b>
<a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool">
<strong>1</strong>
</a>
<a href="https://247sports.com/PlayerSport/Trevor-Lawrence-at-Cartersville-116605/RecruitRankHistory/">
History
</a>
</li>
<li>
<b>PRO</b>
<a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool&Position=PRO">
<strong>1</strong>
</a>
</li>
<li>
<b>GA</b>
<a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool&State=GA">
<strong>1</strong>
</a>
</li>
<li>
<b>All-Time</b>
<a href="https://247sports.com/Sport/Football/AllTimeRecruitRankings/">
<strong>6</strong>
</a>
</li>
</ul>]
But going any further with something like as a "past_link.find_all('a')" led to running into errors. What do you think should be the next step from here? Any assistance is truly appreciated. Thanks in advance.
CodePudding user response:
To get rankings history link from that page you can use next example:
import requests
from bs4 import BeautifulSoup
url = "https://247sports.com/Player/Trevor-Lawrence-61350/college-212444/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
history_link = soup.select_one(".rank-history-link")["href"]
print(history_link)
Prints:
https://247sports.com/PlayerSport/Trevor-Lawrence-at-Cartersville-116605/RecruitRankHistory/