Home > Software engineering >  Web Scrapping-Get multiple tags with same class name
Web Scrapping-Get multiple tags with same class name

Time:09-30

I'm working on web-scrapping to extract data from website 'www.espncricinfo.com' and problem is that there are two same tags with same class, and i get only 1st, find_all() function is not working in for loop.

import requests
from bs4 import BeautifulSoup


def show_scores():
    team2 = ''
    list_of_dict = []
    url = 'https://www.espncricinfo.com/live-cricket-score'
    html_text = requests.get(url).content
    soup = BeautifulSoup(html_text, features='lxml')
    main_div = soup.find_all('div', class_='ds-text-compact-xxs')
    for tag in main_div:
        status = tag.find('span', class_='ds-text-tight-xs ds-font-bold ds-uppercase ds-leading-5')
        team1 = tag.find('p', class_='ds-text-tight-m ds-font-bold ds-capitalize ds-truncate')
        score_team1 = tag.find('strong', class_='')

        # Problem exists: want to get opponent team name and scores as team2 since it has same tag class, so this only get 1st

        final_result = tag.find('p', class_='ds-text-tight-s ds-font-regular ds-truncate ds-text-typo-title')
        details = tag.find('div', class_='ds-text-tight-xs ds-truncate ds-text-ui-typo-mid')
        if status is None:
            continue
        elif status.text.capitalize() == 'Result':
            sum_up = {'Status': status.text, 'Details': details.text, 'Team1': team1.text, 'Team2': team2, 'Score1': score_team1.text, 'Result': final_result.text}
            list_of_dict.append(sum_up)
    return list_of_dict


if __name__ == '__main__':
    show = show_scores()
    for item in show:
        print(f"Showing {item['Status']} for {item['Details']}\n{item['Team1']}\t{item['Score1']}\n{item['Result']}\n")

Output:

Showing RESULT for 5th T20I (N), Lahore, September 28, 2022, England tour of Pakistan
Pakistan    145
Pakistan won by 6 runs

Showing RESULT for 1st T20I (N), Thiruvananthapuram, September 28, 2022, South Africa tour of India
South Africa    106/8
India won by 8 wickets (with 20 balls remaining)

CodePudding user response:

Using find_next should work.

Where your # Problem exists:... comment is, add:

team2 = None if team1 is None else team1.find_next(
   'p', class_='ds-text-tight-m ds-font-bold ds-capitalize ds-truncate')
score_team2 = None if score_team1 is None else score_team1.find_next(
   'strong', class_='')

and then, just before list_of_dict.append(sum_up) add

sum_up['Team2'] = team2.text if team2 else ''
sum_up['Score2'] = score_team2.text if team2 else ''

Then you can use them as you like in your ptint or anywhere else.

(The if...else... is only necessary for score_team2 here, but I wrapped the other 3 definitions too just in case)

  • Related