Home > Back-end >  python Selenium taking instance of webdriver
python Selenium taking instance of webdriver

Time:10-28

i defined two separate functions for opening url with selenium, and fetching data with selenium. In my second function driver variable is unassignable because it stays local inside first function. I do not know if it s logical to separate selenium activity in two separate ways, I use this method first time. Any suggestions to take instance of webdriver and use it inside second function?

import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

#reading from csv file url-s
def readCSV(path_csv):
    df=pd.read_csv(path_csv)
    return df

fileCSV=readCSV(r'C:\Users\Admin\Downloads\urls.csv')
length_of_column_urls=fileCSV['linkamazon'].last_valid_index()

#going to urls 1-by-1
def goToUrl_Se():
    for i in range(0, length_of_column_urls   1):
        xUrl = fileCSV.iloc[i, 1]
        print(xUrl,i)
        # going to url(a,amazn) via Selenium WebDriver
        chrome_options = Options()
        chrome_options.headless = False
        chrome_options.add_argument("start-maximized")
        # options.add_experimental_option("detach", True)
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
        chrome_options.add_experimental_option('useAutomationExtension', False)
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')

        webdriver_service = Service(r'C:\pythonPro\w_crawl\AmznScrpBot\chromedriver.exe')
        driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
        driver.get(xUrl)

    driver.quit()

#fetch-parse the data from url page
def parse_data():
    x_title=driver.find_element(By.XPATH,'//*[@id="search"]/div[1]/div[1]/div/span[3]/div[2]/div[2]/div/div/div/div/div/div[2]/div/div/div[1]/h2/a/span')

goToUrl_Se()

CodePudding user response:

As I see, you trying to parse data from each URL you opening in goToUrl_Se(). If so the better way is to put the parsing data code inside the loop used in goToUrl_Se() method.
Also, no need to define and create driver each time.
And you definitely have to improve your locators. Very long absolute XPaths are extremely fragile and breakable.
The following flow seems for me to be better.

import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = Null

#reading from csv file url-s
def readCSV(path_csv):
    df=pd.read_csv(path_csv)
    return df

fileCSV=readCSV(r'C:\Users\Admin\Downloads\urls.csv')
length_of_column_urls=fileCSV['linkamazon'].last_valid_index()

def create_driver():
        chrome_options = Options()
        chrome_options.headless = False
        chrome_options.add_argument("start-maximized")
        # options.add_experimental_option("detach", True)
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
        chrome_options.add_experimental_option('useAutomationExtension', False)
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')

        webdriver_service = Service(r'C:\pythonPro\w_crawl\AmznScrpBot\chromedriver.exe')
        driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

#going to urls 1-by-1
def goToUrl_Se():
    for i in range(0, length_of_column_urls   1):
        xUrl = fileCSV.iloc[i, 1]
        print(xUrl,i)
        # going to url(a,amazn) via Selenium WebDriver
        driver.get(xUrl)
        x_title=driver.find_element(By.XPATH,'//*[@id="search"]/div[1]/div[1]/div/span[3]/div[2]/div[2]/div/div/div/div/div/div[2]/div/div/div[1]/h2/a/span')
    driver.quit()

create_driver()
goToUrl_Se()

CodePudding user response:

You should return the driver from your create_driver() function:

def create_drive():
   // ...
   return driver

and change your function to accept a parameter:

def parse_data(driver):
    // ...

Now you can get the driver with an assignment and pass it to your function:

driver = create_driver()
parse_data(driver)

I suggest you read more about return values and function parameters to understand this better.

CodePudding user response:

In this structure you can call your second function parse_data within your first function goToUrl_Se() only.

like:

driver.get(xUrl)
somoething = parse_data()

and change parse_data for it to return something

if you want to call them both outside themselves, then you need to do 2 things:

  1. parse_data should get driver as and argumentdef parse_data(driver)
  2. you should not quit selenium within goToUrl_Se()

and if you want to do it as it really should be done, then just use OOP. If you still don't want to, then you'd better initiate driver name ouside any functions and use function to change it. For instance you can have a function that change driver's options only. But that's bad practice when one function does multiple things, like your goToUrl_Se() one.

  • Related