Home > Back-end >  got empty result in scraping a record
got empty result in scraping a record

Time:10-12

i made a progam to scrap attributes from single record from web but i am getting nothing in my variables below is what i tried. I am unable to understand where my logic is wrong

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path='chromedriver.exe')
url = "https://openlibrary.org/works/OL7960560W/Eyewitness?edition=ia:cowboy0000murd_y0x0"
global title
driver.get(url)
wait = WebDriverWait(driver,5)
items = wait.until(EC.presence_of_all_elements_located((By.XPATH,'//div[@]')))
for item in items:
    title = item.find_element(By.CLASS_NAME,'work-title').text

print("title = ",title)

CodePudding user response:

There are several issues here:

  1. You are locating a wrong element.
    There is only 1 element matching '//div[@]'.
  2. Also, instead of presence_of_all_elements_located you should use visibility_of_all_elements_located there.
  3. The print("title = ",title) should be done inside the for loop block. Otherwise it's content will be overwritten each loop iteration and only the last value will be finally printed.

The following code works:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)

url = "https://openlibrary.org/works/OL7960560W/Eyewitness?edition=ia:cowboy0000murd_y0x0"

driver.get(url)
titles = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '.book .title>a')))
for title in titles:
    print(title.text)

The output is:

Eyewitness: Cowboy (Eyewitness Books)
Eyewitness: Horse (Eyewitness Books)
Eyewitness: Goya (Eyewitness Books)

I used CSS Selector, but XPath can be used as well here.

CodePudding user response:

Here is a way of locating those elements, a bit more reliably:

    [...]
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
actions = ActionChains(driver)
wait = WebDriverWait(driver, 20)
url = "https://openlibrary.org/works/OL7960560W/Eyewitness?edition=ia:cowboy0000murd_y0x0e"
driver.get(url)

items = wait.until(EC.presence_of_all_elements_located((By.XPATH,'//table[@id="editions"]//div[@]/a')))
for i in items:
    print(i.text)

Result in terminal:

Eyewitness: Seashore (Eyewitness Books)
Eyewitness: Horse (Eyewitness Books)
Eyewitness: Goya (Eyewitness Books)
  • Related