How i can get href from row-CodePudding

I do some telegram bot, and i need to get links from html. I want to take href for Matches from this website https://www.hltv.org/matches

My previous code is

     elif message.text == "Matches":
        url_news = "https://www.hltv.org/matches"
        response = requests.get(url_news)
        soup = BeautifulSoup(response.content, "html.parser")
        match_info = []
        match_items = soup.find("div", class_="upcomingMatchesSection")
        print(match_items)
        for item in match_items:
            match_info.append({
                    "link": item.find("div", class_="upcomingMatch").text,
                    "title": item["href"]

            })

And i dont know how i can get links from this body.Appreciate any help

CodePudding user response：

What happens?

You try to iterate over match_items but there is nothing to iterate, cause you only selected the section including the matches but not the matches itself.

How to fix?

Select the upcomingMatches instead and iterate over them:

match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")

Getting the url you have to select an <a>:

item.a["href"]

Example

from bs4 import BeautifulSoup as bs
import requests


url_news = "https://www.hltv.org/matches"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}

response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")

for item in match_items:
    match_info.append({
            "title": item.get_text('|', strip=True),
            "link": item.a["href"]

    })
match_info

Output

[{'title': '09:00|bo3|1WIN|K23|Pinnacle Fall Series 2|Odds',
  'link': '/matches/2352066/1win-vs-k23-pinnacle-fall-series-2'},
 {'title': '09:00|bo3|INDE IRAE|Nemiga|Pinnacle Fall Series 2|Odds',
  'link': '/matches/2352067/inde-irae-vs-nemiga-pinnacle-fall-series-2'},
 {'title': '10:00|bo3|OPAA|Nexus|Malta Vibes Knockout Series 3|Odds',
  'link': '/matches/2352207/opaa-vs-nexus-malta-vibes-knockout-series-3'},
 {'title': '11:00|bo3|Checkmate|TBC|Funspark ULTI 2021 Asia Regional Series 3|Odds',
  'link': '/matches/2352092/checkmate-vs-tbc-funspark-ulti-2021-asia-regional-series-3'},
 {'title': '11:00|bo3|ORDER|Alke|ESEA Premier Season 38 Australia|Odds',
  'link': '/matches/2352122/order-vs-alke-esea-premier-season-38-australia'},...]

CodePudding user response：

You can try this out.

All the match information is present inside a <div> with classname as upcomingMatch
Select all those <div> and from each <div>, extract the match link which is present inside the <a> tag with class name as match.

Here is the code:

import requests
from bs4 import BeautifulSoup

url_news = "https://www.hltv.org/matches"
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
response = requests.get(url_news,headers=headers)
soup = BeautifulSoup(response.text, "lxml")
match_items = soup.find_all("div", class_="upcomingMatch")

for match in match_items:
    link = match.find('a', class_='match a-reset')['href']
    print(f'Link: {link}')

Link: /matches/2352235/malta-vibes-knockout-series-3-quarter-final-1-malta-vibes-knockout-series-3
Link: /matches/2352098/pinnacle-fall-series-2-quarter-final-2-pinnacle-fall-series-2
Link: /matches/2352236/malta-vibes-knockout-series-3-quarter-final-2-malta-vibes-knockout-series-3
Link: /matches/2352099/pinnacle-fall-series-2-quarter-final-3-pinnacle-fall-series-2
.
.
.