Home > Software design >  Selenium Webdriver - How to extract texts through scraping
Selenium Webdriver - How to extract texts through scraping

Time:06-11

I am trying to scrape information from a career website of a company. I want to get the reference code of the respective job ad.

I want to use Selenium and tried to identify the job posting code with xpath. When I run the code a google Chrom window opens and uses the correct web address:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas as pd

PATH = "C:/Users/MyUser/Desktop/Driver/chromedriver.exe"

driver = webdriver.Chrome(PATH)

driver.get("https://www.uke.jobs/sap(bD1kZSZjPTUwMA==)/bc/bsp/kwp/bsp_eui_rd_uc/main.do?action=to_uc_search")
driver.maximize_window()

ref_code = driver.find_elements_by_xpath("//tr[@data-eui-handler=\"{event:'click',handler:'eui.app.controller.search_results.selectRow'}\"]/td[1]")

print(len(ref_code))

User_input = input()

When running the code it takes for ever and I get the following results:

DevTools listening on ws://127.0.0.1:52187/devtools/browser/7300c3d2-42d1-4f8e-a136-4e1ce37bcb87
c:\Users\MyUser\Desktop\PyhtonVisStuCo\Selenium.py:15: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead
  ref_code = driver.find_elements_by_xpath("//tr[@data-eui-handler=\"{event:'click',handler:'eui.app.controller.search_results.selectRow'}\"]/td[1]")
0
[3516:18308:0609/194039.395:ERROR:device_event_log_impl.cc(214)] [19:40:39.395] Bluetooth: bluetooth_adapter_winrt.cc:1074 Getting Default Adapter failed.

What am I doing wrong?

CodePudding user response:

To extract the texts from the Referenzcode column you can use List Comprehension and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get("https://www.uke.jobs/sap(bD1kZSZjPTUwMA==)/bc/bsp/kwp/bsp_eui_rd_uc/main.do?action=to_uc_search")
    print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "table#table_search_results tr[data-head] td:first-of-type")])
    
  • Using XPATH:

    driver.get("https://www.uke.jobs/sap(bD1kZSZjPTUwMA==)/bc/bsp/kwp/bsp_eui_rd_uc/main.do?action=to_uc_search")
    print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//table[@id='table_search_results']//tr[@data-head]/td")])
    
  • Console Output:

    ['ZVW22192', 'ZPF2208_ex', 'ZPF2207_e', 'ZPF2206_e', 'ZMF2249', 'ZIT22484', 'ZIT22444', 'ZIT22380', 'ZIT22379', 'WS22536']
    
  • Related