I'm working on web-scrapping to extract data from website 'www.espncricinfo.com' and problem is that there are two same tags with same class, and i get only 1st, find_all()
function is not working in for
loop.
import requests
from bs4 import BeautifulSoup
def show_scores():
team2 = ''
list_of_dict = []
url = 'https://www.espncricinfo.com/live-cricket-score'
html_text = requests.get(url).content
soup = BeautifulSoup(html_text, features='lxml')
main_div = soup.find_all('div', class_='ds-text-compact-xxs')
for tag in main_div:
status = tag.find('span', class_='ds-text-tight-xs ds-font-bold ds-uppercase ds-leading-5')
team1 = tag.find('p', class_='ds-text-tight-m ds-font-bold ds-capitalize ds-truncate')
score_team1 = tag.find('strong', class_='')
# Problem exists: want to get opponent team name and scores as team2 since it has same tag class, so this only get 1st
final_result = tag.find('p', class_='ds-text-tight-s ds-font-regular ds-truncate ds-text-typo-title')
details = tag.find('div', class_='ds-text-tight-xs ds-truncate ds-text-ui-typo-mid')
if status is None:
continue
elif status.text.capitalize() == 'Result':
sum_up = {'Status': status.text, 'Details': details.text, 'Team1': team1.text, 'Team2': team2, 'Score1': score_team1.text, 'Result': final_result.text}
list_of_dict.append(sum_up)
return list_of_dict
if __name__ == '__main__':
show = show_scores()
for item in show:
print(f"Showing {item['Status']} for {item['Details']}\n{item['Team1']}\t{item['Score1']}\n{item['Result']}\n")
Output:
Showing RESULT for 5th T20I (N), Lahore, September 28, 2022, England tour of Pakistan
Pakistan 145
Pakistan won by 6 runs
Showing RESULT for 1st T20I (N), Thiruvananthapuram, September 28, 2022, South Africa tour of India
South Africa 106/8
India won by 8 wickets (with 20 balls remaining)
CodePudding user response:
Using find_next
should work.
Where your # Problem exists:...
comment is, add:
team2 = None if team1 is None else team1.find_next(
'p', class_='ds-text-tight-m ds-font-bold ds-capitalize ds-truncate')
score_team2 = None if score_team1 is None else score_team1.find_next(
'strong', class_='')
and then, just before list_of_dict.append(sum_up)
add
sum_up['Team2'] = team2.text if team2 else ''
sum_up['Score2'] = score_team2.text if team2 else ''
Then you can use them as you like in your ptint
or anywhere else.
(The if...else...
is only necessary for score_team2
here, but I wrapped the other 3 definitions too just in case)