Home > Software design >  For loop with lot of different Urls
For loop with lot of different Urls

Time:12-27

totally novice in python, after many youtube videos and tutorial i'm trying to scrape basketball starting lineups from flashscore. Here's an example of a link: https://www.flashscore.it/partita/6PN3pAhq/#informazioni-partita/formazioni

As you can see in the middle there's a code (6PN3pAhq) that corresponds to a particular match: every match has a different one, i scraped all the results (144 matches at the moment) and stored it to an excel file...but now i'm searching for the best way to looping trough these differents Urls to scrape every match lineups (and appending to a unique dataframe)...

Here's my code for the url above, any help is very appreciated!

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import pandas as pd

URL = "https://www.flashscore.it/partita/6PN3pAhq/#informazioni-partita/formazioni"

driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(URL)

sleep(5)

driver.find_element_by_id('onetrust-accept-btn-handler').click()
soup = BeautifulSoup(driver.page_source, "html.parser")

start = []


id = soup.find(class_="section")
  
for id2 in id.find_all("a", {"class": "lf__participantName"}):
    start.append(id2.get('href'))

df = pd.DataFrame(start)

print (df)

CodePudding user response:

If you need to store all the matches in an excel file somewhere you could use any number of open source tools to parse the excel file and extract the match numbers (see: http://www.python-excel.org/ for available options).

However, the simplest way, if possible, is to bypass excel entirely and store all of them in some text file OR into your python program itself:

games = [
    '6PN3pAhq'
    'game2Code',
    'game3Code'
    # and more...
]

And in your core code, use a string template:


# The url_template below contains a `{}` space where we can put any value when we format the string 
# see: https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method 
_url_template = "https://www.flashscore.it/partita/{}/#informazioni-partita/formazioni"

for game_code in games:
   # extract getting game score logic somewhere else
   get_game_scores(game_code)

def get_game_scores(game_code):
   formatted_url = _url_template.format(game_code)
   # do the stuff you did above

Lots of ways to go about this, but the core idea of this simple implementation is to separate the way you extract and parse game codes and how you get the game scores. The parser should store the game codes into some final collection you can just loop over and the logic to get game scores can focus on extract just a single game's score.

CodePudding user response:

thank for your help, i wasn't able to make your piece of code work but i got what you meant...i solved this way:

f = open("G:\matchid.txt", "r")

for id in f:
    matchid.append(id)

_url_template = "https://www.flashscore.it/partita/{}/#informazioni- 
partita/formazioni"

for x in matchid:
    formatted_url = _url_template.format(x)
    print(formatted_url)

Than attached the other part of the code, works great!

  • Related