I am new to Selenium and am trying to scrape data (just names for now) from these bourbon product cards on thewhiskeyexchange.com. I have tested all of my css (and xpath) selectors in scrapy shell so I know that they are correct, but the output returns coded information about the "session" and the element that I do not understand. The quantity of items in the list seem to be correct, so maybe Selenium is doing exactly what it is supposed to do and I just dont know how to convert the output to something I should use. How do I get just the names from the product cards?
I have tried both the driver and the local selector functions Selenium offers with the same results. beautiful soup functions return the data I need, but that method is too inefficient for the scope of the project I am working on. Any insight as to how I can fix this would be greatly appreciated.
IN[]:
chrome_options = Options()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.binary_location = "C:\Program Files\Google\Chrome\Application\chrome.exe"
IN[]:
driver = webdriver.Chrome(ChromeDriverManager().install())
IN[]:
url = "https://www.thewhiskyexchange.com/c/639/bourbon-whiskey"
driver.get(url)
time.sleep(5) # second delay to improve visual quality
html = driver.page_source
html # HTTP request response object is as expected
IN[]:
els = driver.find_elements_by_css_selector('p.product-card__name')
# local method: els = driver.find_elements(By.CSS_SELECTOR, 'p.product-card__name')
els
OUT[]:
[<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="b9384a19-f8c9-46b2-be99-780200dcba99")>,
<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="af76dfa8-b86c-426a-8ad8-30ea904ed11b")>,
<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="58b14e5a-6bc3-443a-807f-ec696e83b096")>, ...
CodePudding user response:
find_elements
returns a list of web element whereas find_element
returns a single web element.
You can iterate over the list and extract the text like it below:
IN[]:
els = driver.find_elements(By.CSS_SELECTOR, 'p.product-card__name')
for e in els:
print(e.text)
Also, note that find_elements_by_css_selector
has been deprecated in newer selenium version (also known as Selenium 4
) so one should use find_elements(By.CSS_SELECTOR, "")
instead.