i defined two separate functions for opening url with selenium, and fetching data with selenium.
In my second function driver
variable is unassignable because it stays local inside first function.
I do not know if it s logical to separate selenium activity in two separate ways, I use this method first time.
Any suggestions to take instance of webdriver and use it inside second function?
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
#reading from csv file url-s
def readCSV(path_csv):
df=pd.read_csv(path_csv)
return df
fileCSV=readCSV(r'C:\Users\Admin\Downloads\urls.csv')
length_of_column_urls=fileCSV['linkamazon'].last_valid_index()
#going to urls 1-by-1
def goToUrl_Se():
for i in range(0, length_of_column_urls 1):
xUrl = fileCSV.iloc[i, 1]
print(xUrl,i)
# going to url(a,amazn) via Selenium WebDriver
chrome_options = Options()
chrome_options.headless = False
chrome_options.add_argument("start-maximized")
# options.add_experimental_option("detach", True)
chrome_options.add_argument("--no-sandbox")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
webdriver_service = Service(r'C:\pythonPro\w_crawl\AmznScrpBot\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
driver.get(xUrl)
driver.quit()
#fetch-parse the data from url page
def parse_data():
x_title=driver.find_element(By.XPATH,'//*[@id="search"]/div[1]/div[1]/div/span[3]/div[2]/div[2]/div/div/div/div/div/div[2]/div/div/div[1]/h2/a/span')
goToUrl_Se()
CodePudding user response:
As I see, you trying to parse data from each URL you opening in goToUrl_Se()
. If so the better way is to put the parsing data code inside the loop used in goToUrl_Se()
method.
Also, no need to define and create driver
each time.
And you definitely have to improve your locators. Very long absolute XPaths are extremely fragile and breakable.
The following flow seems for me to be better.
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = Null
#reading from csv file url-s
def readCSV(path_csv):
df=pd.read_csv(path_csv)
return df
fileCSV=readCSV(r'C:\Users\Admin\Downloads\urls.csv')
length_of_column_urls=fileCSV['linkamazon'].last_valid_index()
def create_driver():
chrome_options = Options()
chrome_options.headless = False
chrome_options.add_argument("start-maximized")
# options.add_experimental_option("detach", True)
chrome_options.add_argument("--no-sandbox")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
webdriver_service = Service(r'C:\pythonPro\w_crawl\AmznScrpBot\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
#going to urls 1-by-1
def goToUrl_Se():
for i in range(0, length_of_column_urls 1):
xUrl = fileCSV.iloc[i, 1]
print(xUrl,i)
# going to url(a,amazn) via Selenium WebDriver
driver.get(xUrl)
x_title=driver.find_element(By.XPATH,'//*[@id="search"]/div[1]/div[1]/div/span[3]/div[2]/div[2]/div/div/div/div/div/div[2]/div/div/div[1]/h2/a/span')
driver.quit()
create_driver()
goToUrl_Se()
CodePudding user response:
You should return the driver
from your create_driver()
function:
def create_drive():
// ...
return driver
and change your function to accept a parameter:
def parse_data(driver):
// ...
Now you can get the driver with an assignment and pass it to your function:
driver = create_driver()
parse_data(driver)
I suggest you read more about return values and function parameters to understand this better.
CodePudding user response:
In this structure you can call your second function parse_data
within your first function goToUrl_Se()
only.
like:
driver.get(xUrl)
somoething = parse_data()
and change parse_data
for it to return something
if you want to call them both outside themselves, then you need to do 2 things:
- parse_data should get driver as and argument
def parse_data(driver)
- you should not quit selenium within
goToUrl_Se()
and if you want to do it as it really should be done, then just use OOP. If you still don't want to, then you'd better initiate driver
name ouside any functions and use function to change it. For instance you can have a function that change driver's options only. But that's bad practice when one function does multiple things, like your goToUrl_Se()
one.