I do some telegram bot, and i need to get links from html. I want to take href for Matches from this website https://www.hltv.org/matches
My previous code is
elif message.text == "Matches":
url_news = "https://www.hltv.org/matches"
response = requests.get(url_news)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.find("div", class_="upcomingMatchesSection")
print(match_items)
for item in match_items:
match_info.append({
"link": item.find("div", class_="upcomingMatch").text,
"title": item["href"]
})
And i dont know how i can get links from this body.Appreciate any help
CodePudding user response:
What happens?
You try to iterate over match_items
but there is nothing to iterate, cause you only selected the section including the matches but not the matches itself.
How to fix?
Select the upcomingMatches instead and iterate over them:
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
Getting the url
you have to select an <a>
:
item.a["href"]
Example
from bs4 import BeautifulSoup as bs
import requests
url_news = "https://www.hltv.org/matches"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
for item in match_items:
match_info.append({
"title": item.get_text('|', strip=True),
"link": item.a["href"]
})
match_info
Output
[{'title': '09:00|bo3|1WIN|K23|Pinnacle Fall Series 2|Odds',
'link': '/matches/2352066/1win-vs-k23-pinnacle-fall-series-2'},
{'title': '09:00|bo3|INDE IRAE|Nemiga|Pinnacle Fall Series 2|Odds',
'link': '/matches/2352067/inde-irae-vs-nemiga-pinnacle-fall-series-2'},
{'title': '10:00|bo3|OPAA|Nexus|Malta Vibes Knockout Series 3|Odds',
'link': '/matches/2352207/opaa-vs-nexus-malta-vibes-knockout-series-3'},
{'title': '11:00|bo3|Checkmate|TBC|Funspark ULTI 2021 Asia Regional Series 3|Odds',
'link': '/matches/2352092/checkmate-vs-tbc-funspark-ulti-2021-asia-regional-series-3'},
{'title': '11:00|bo3|ORDER|Alke|ESEA Premier Season 38 Australia|Odds',
'link': '/matches/2352122/order-vs-alke-esea-premier-season-38-australia'},...]
CodePudding user response:
You can try this out.
- All the match information is present inside a
<div>
with classname asupcomingMatch
- Select all those
<div>
and from each<div>
, extract the match link which is present inside the<a>
tag with class name asmatch
.
Here is the code:
import requests
from bs4 import BeautifulSoup
url_news = "https://www.hltv.org/matches"
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
response = requests.get(url_news,headers=headers)
soup = BeautifulSoup(response.text, "lxml")
match_items = soup.find_all("div", class_="upcomingMatch")
for match in match_items:
link = match.find('a', class_='match a-reset')['href']
print(f'Link: {link}')
Link: /matches/2352235/malta-vibes-knockout-series-3-quarter-final-1-malta-vibes-knockout-series-3
Link: /matches/2352098/pinnacle-fall-series-2-quarter-final-2-pinnacle-fall-series-2
Link: /matches/2352236/malta-vibes-knockout-series-3-quarter-final-2-malta-vibes-knockout-series-3
Link: /matches/2352099/pinnacle-fall-series-2-quarter-final-3-pinnacle-fall-series-2
.
.
.