I am wanting to go to https://www.bookmaker.com.au/sports/soccer, extract the soccer urls which it does. I am then wanting to go to each of those webpages through driver.get(url). I have done this as a list and then it extracts the data for each of those urls and place in pandas. I am stuck at getting driver.get(url) for each of those links extracted. Any help appreciated.
Css/href for driver.get(url):
url = #a[class *= 'matches-filter__region']
import time
import pandas as pd
import webdriver_manager.chrome
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
###########################################################################################################################################################
options = webdriver.ChromeOptions()
options.add_argument('--start-maximized')
options.add_experimental_option("detach", True)
service = Service('driver/chromedriver.exe')
driver = webdriver.Chrome(service=Service(webdriver_manager.chrome.ChromeDriverManager().install()), options=options)
driver.get('https://www.bookmaker.com.au/sports/soccer')
aa = driver.find_elements(By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")
WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")))
################################################################################################################
for url in aa:
aa = driver.find_elements(By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")
driver.get(aa)
##############################################################################
#Full Code https://pastebin.com/W0VqaKVD
CodePudding user response:
Updated the code, I checked, it is working, navigating to all the 15 urls in the same browser window:
diff_country_urls = []
for i in range(len(aa)):
diff_country_urls.append(aa[i].get_attribute("href"))
for url in diff_country_urls:
driver.get(url)
CodePudding user response:
After using VPN I could connect and I found few problems with code
you have to get
aa
beforefor
-loopfind
gives reference to objects in browser's memory but when you useget()
then it removes these objects from memory to create new object from new page, and results fromfind()
are useless. You have to use.get_attribute('href')
to get urls as strings.you have to use
for
-loop to runget()
for every string from list and you have to run other code inside this loop. And after loop you have to createDataFrame
This is code without code inside loop. But at least it visits all urls.
# --- before loop ---
all_a = driver.find_elements(By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")
# get URLs as strings
all_urls = []
for item in all_a:
all_urls.append(item.get_attribute('href'))
# shorter
#all_urls = [item.get_attribute('href') for item in all_a]
team1List = []
backOddsList = []
team2List = []
layOddsList = []
# --- loop ---
# visit all pages and get teams
for url in all_urls:
print(url)
driver.get(url)
# here (inside loop) all code to get teams from active page
# --- after loop ---
df = pd.DataFrame({
'Team1': team1List,
'Back Odds': backOddsList,
'Team2': team2List,
'Lay Odds': layOddsList
})
df.to_excel('bookmaker.xlsx', engine='openpyxl', sheet_name='Sheet_name_1', index=False)