Home > Net >  how to use driver.get(url) from extracted href list from the page?
how to use driver.get(url) from extracted href list from the page?

Time:09-20

I am wanting to go to https://www.bookmaker.com.au/sports/soccer, extract the soccer urls which it does. I am then wanting to go to each of those webpages through driver.get(url). I have done this as a list and then it extracts the data for each of those urls and place in pandas. I am stuck at getting driver.get(url) for each of those links extracted. Any help appreciated.

Css/href for driver.get(url):

url = #a[class *= 'matches-filter__region']

import time
import pandas as pd
import webdriver_manager.chrome
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
###########################################################################################################################################################

options = webdriver.ChromeOptions()
options.add_argument('--start-maximized')
options.add_experimental_option("detach", True)
service = Service('driver/chromedriver.exe')
driver = webdriver.Chrome(service=Service(webdriver_manager.chrome.ChromeDriverManager().install()), options=options)
driver.get('https://www.bookmaker.com.au/sports/soccer')
aa = driver.find_elements(By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")
WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")))
################################################################################################################

for url in aa:
    aa = driver.find_elements(By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")
    driver.get(aa)

##############################################################################

#Full Code https://pastebin.com/W0VqaKVD

CodePudding user response:

Updated the code, I checked, it is working, navigating to all the 15 urls in the same browser window:

diff_country_urls = []
for i in range(len(aa)):
    diff_country_urls.append(aa[i].get_attribute("href"))

for url in diff_country_urls:
    driver.get(url)

CodePudding user response:

After using VPN I could connect and I found few problems with code

  1. you have to get aa before for-loop

  2. find gives reference to objects in browser's memory but when you use get() then it removes these objects from memory to create new object from new page, and results from find() are useless. You have to use .get_attribute('href') to get urls as strings.

  3. you have to use for-loop to run get() for every string from list and you have to run other code inside this loop. And after loop you have to create DataFrame

This is code without code inside loop. But at least it visits all urls.

# --- before loop ---

all_a = driver.find_elements(By.CSS_SELECTOR, "a[class *= 'matches-filter__region']")

# get URLs as strings
all_urls = []
for item in all_a:
    all_urls.append(item.get_attribute('href'))

# shorter    
#all_urls = [item.get_attribute('href') for item in all_a]

team1List = []
backOddsList = []
team2List = []
layOddsList = []

# --- loop ---

# visit all pages and get teams

for url in all_urls:
    print(url)

    driver.get(url)

    # here (inside loop) all code to get teams from active page

# --- after loop ---

df = pd.DataFrame({
    'Team1': team1List,
    'Back Odds': backOddsList,
    'Team2': team2List,
    'Lay Odds': layOddsList
})
 
df.to_excel('bookmaker.xlsx', engine='openpyxl', sheet_name='Sheet_name_1', index=False)
  • Related