Home > Software design >  Can't get elements with Python and Selenium by Xpath
Can't get elements with Python and Selenium by Xpath

Time:11-12

I am trying to get the "job-title " and the "href" of this webpage with python and selenium.

It only returns me blanks and no data.

job_card = driver.find_elements_by_xpath('//div[contains(@class,"job-info-wrapper ")]')
    
for job in job_card:
   
                              
    try:
        title = job.find_elements_by_xpath('.//a[contains(@class, "job-title ")]')
    except:
        title = job.find_elements_by_xpath('.//a[contains(@class, "job-title ")]').get_attribute(name="job-title ")
    titles.append(title)
    print(title)
   
    links.append(job.get_attribute(name="a href"))

this is the webpage:

enter image description here

What I am doing wrong here?

CodePudding user response:

So when you have the job card just append it's href and innertext. Also the next page should be unindented. Also errors would be to use waits to catch any popups at first.

wait=WebDriverWait(driver, 10)

driver.get('https://www.vietnamworks.com/job-search/all-jobs?filtered=true')

titles=[]
links =[]

###########################################################################################
# Click search Button 
try:
    wait.until(EC.element_to_be_clickable((By.XPATH, '//a[contains(@class, "button searchBar__button")]'))).click()
except:
    pass

try:
    wait.until(EC.element_to_be_clickable((By.XPATH, '//a[contains(@class, "button searchBar__button")]'))).click()
except:
    pass

###########################################################################################
#loop

for i in range(0,20):
    
    job_card = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'job-info-wrapper ')]//a[@class='job-title priorityJob']")))
    print(len(job_card))
    for job in job_card:
        links.append(job.get_attribute("href"))
        titles.append(job.text)
        print(job.get_attribute("href"),job.text)

    try:
        wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@class='page-link' and .='>']"))).click()      
    except NoSuchElementException:
        break


print("Page: {}".format(str(i 2)))           
    
df_da=pd.DataFrame()
df_da['Title']=titles
df_da['Link']=links
    
        
print(df_da)

Outputs

                                                Title                                               Link
0           QC Engineers (Tester, QA QC, Manual)(NEW)  https://www.vietnamworks.com/qc-engineers-test...
1                 Business Analyst (IT Industry)(NEW)  https://www.vietnamworks.com/business-analyst-...
2    Unity Game Developer (Up to 40,000,000 VNĐ)(NEW)  https://www.vietnamworks.com/unity-game-develo...
3   3D Modeler ( Background Modeler ) - Up to 30,0...  https://www.vietnamworks.com/3d-modeler-backgr...
4   Chuyên Viên Quản Trị Hệ Thống Công Nghệ Thông ...  https://www.vietnamworks.com/chuyen-vien-quan-...
5                              Financial Analyst(NEW)  https://www.vietnamworks.com/financial-analyst...
6   Chuyên Viên Cao Cấp Tuyển Dụng (Nghỉ Thứ 7 Và ...  https://www.vietnamworks.com/chuyen-vien-cao-c...
7                  Chuyên Viên Cao Cấp Tài Chính(NEW)  https://www.vietnamworks.com/chuyen-vien-cao-c...
8                    Trưởng Ban Kiểm Toán Nội Bộ(NEW)  https://www.vietnamworks.com/truong-ban-kiem-t...
9                         Dealer Operation Staff(NEW)  https://www.vietnamworks.com/dealer-operation-...
10          Supervisor - Tiếng Nhật - Phòng Sale(NEW)  https://www.vietnamworks.com/supervisor-tieng-...
11             IT Manager – Back Office Division(NEW)  https://www.vietnamworks.com/it-manager-back-o...
12  Hot Job - Nhân Viên Xuất Nhập Khẩu (Lương Thưở...  https://www.vietnamworks.com/hot-job-nhan-vien...
13  General Accountant for Luxury Brand - Attracti...  https://www.vietnamworks.com/general-accountan...
14   Chuyên Viên Kinh Doanh (Thương Mại Điện Tử)(NEW)  https://www.vietnamworks.com/chuyen-vien-kinh-...
15  Java Developer (Thu Nhập Tương Đương Từ 14 - 2...  https://www.vietnamworks.com/java-developer-th...
16  Logistics Executive (Salary up to 500$ Per mon...  https://www.vietnamworks.com/logistics-executi...
17  Customs Liquidation & Customs Declaration Staf...  https://www.vietnamworks.com/customs-liquidati...
18  Chuyên Viên Kinh Doanh Thiết Bị Y Tế - [Mức Lư...  https://www.vietnamworks.com/chuyen-vien-kinh-...
19                       Trade Operation Officer(NEW)  https://www.vietnamworks.com/trade-operation-o...
20                 Nhân Viên PR - Quản Lý Đô Thị(NEW)  https://www.vietnamworks.com/nhan-vien-pr-quan...

CodePudding user response:

As per the DOM, the title of the job is the text contained in the tag a.

Use .get_attribute("innerText") or .text to get the title from the job option.

And to retrieve the href attribute from the element use .get_attribute("href")

And to find an element use - find_element instead of find_elements. find_elements will return a list of webelements.

Try like below.

driver.get("https://www.vietnamworks.com/job-search/all-jobs?filtered=true")

wait = WebDriverWait(driver,30)

try:
    wait.until(EC.element_to_be_clickable((By.XPATH,"//div[@class='sc-fznWqX dAkvW']//*[name()='svg' and @class='filter-close']"))).click()
except:
    print("No pop-up")
titles = []
links = []
job_card = driver.find_elements_by_xpath('//div[contains(@class,"job-info-wrapper ")]')

for job in job_card:
    element = job.find_element_by_xpath(".//a[contains(@class,'job-title')]")
    title = element.get_attribute("innerText")
    link = element.get_attribute("href")
    print(f"{title} : {link}")
No pop-up
Chuyên Viên Triển Khai Phần Mềm ERP / ERP Specialist(NEW) : https://www.vietnamworks.com/chuyen-vien-trien-khai-phan-mem-erp-erp-specialist-1438267-jd/?source=searchResults&searchType=2&placement=1438268&sortBy=date
[HN] Data Engineer(NEW) : https://www.vietnamworks.com/hn-data-engineer-2-1431499-jd/?source=searchResults&searchType=2&placement=1431500&sortBy=date
Chuyên Viên Pháp Chế(NEW) : https://www.vietnamworks.com/chuyen-vien-phap-che-510-1-1429155-jd/?source=searchResults&searchType=2&placement=1429156&sortBy=date
...

CodePudding user response:

You were almost there. You just need two minor modification as follows:

  • get_attribute() is an attribute of a WebElement. So instead of find_elements* you need to use find_element*
  • Within get_attribute() you just need to pass the attribute name as get_attribute("class")

To get the values of the Job Title and href attribute you can use the following Locator Strategies:

driver.get("https://www.vietnamworks.com/job-search/all-jobs?filtered=true")
titles = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='job-info-wrapper ']//h3/a")))]
hrefs = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='job-info-wrapper ']//h3/a")))]
for i,j in zip(titles, hrefs):
    print(f"Title: {i} have Href: {j}")
driver.quit()

Console Output:

Title: Graphic Designer<span class="new">(New)</span> have Href: https://www.vietnamworks.com/graphic-designer-923-1434158-jd/?source=searchResults&searchType=2&placement=1434159&sortBy=date
Title: Minthacare Vietnam have Href: https://www.vietnamworks.com/jobs-at-minthacare-vietnam-e6599851-en
Title: Nhân Viên Kinh Doanh Quốc Tế Làm Việc Tại Tây Phi (Tiếng Anh Tốt, Ưu Tiên Sinh Viên Mới Ra Trường)<span class="new">(New)</span> have Href: https://www.vietnamworks.com/nhan-vien-kinh-doanh-quoc-te-lam-viec-tai-tay-phi-tieng-anh-tot-uu-tien-sinh-vien-moi-ra-truong-1-1434157-jd/?source=searchResults&searchType=2&placement=1434158&sortBy=date
Title: Công Ty Cổ Phần Tập Đoàn Tân Long have Href: https://www.vietnamworks.com/jobs-at-cong-ty-co-phan-tap-doan-tan-long-e4036305-en
Title: Technical Service Manager (Solar Inverter)<span class="new">(New)</span> have Href: https://www.vietnamworks.com/technical-service-manager-solar-inverter-1-1434006-jd/?source=searchResults&searchType=2&placement=1434007&sortBy=date
Title: Goodwe Technologies Co.,ltd. have Href: https://www.vietnamworks.com/jobs-at-goodwe-technologies-co-ltd--e6294177-en
Title: Business Development Executive (Solar Energy)<span class="new">(New)</span> have Href: https://www.vietnamworks.com/business-development-executive-solar-energy-2-1434040-jd/?source=searchResults&searchType=2&placement=1434041&sortBy=date
Title: Goodwe Technologies Co.,ltd. have Href: https://www.vietnamworks.com/jobs-at-goodwe-technologies-co-ltd--e6294177-en
Title: [HCM - Fresher] - Customer Development Next Gen (Thu Nhập 15 -25,000,000)<span class="new">(New)</span> have Href: https://www.vietnamworks.com/hcm-fresher-customer-development-next-gen-thu-nhap-15-25-000-000-1431677-jd/?source=searchResults&searchType=2&placement=1431678&sortBy=date
Title: One Mount have Href: https://www.vietnamworks.com/jobs-at-one-mount-e6166682-en
Title: Chuyên Viên Quan Hệ Khách Hàng - [BAC A BANK - Chi Nhánh Đông Anh/ Gia Lâm/ Khu Vực Hà Nội/ Quảng Ninh/ Hà Giang]<span class="new">(New)</span> have Href: https://www.vietnamworks.com/chuyen-vien-quan-he-khach-hang-bac-a-bank-chi-nhanh-dong-anh-gia-lam-khu-vuc-ha-noi-quang-ninh-ha-giang-1431466-jd/?source=searchResults&searchType=2&placement=1431467&sortBy=date
Title: Ngân Hàng Thương Mại Cổ Phần Bắc Á have Href: https://www.vietnamworks.com/jobs-at-ngan-hang-thuong-mai-co-phan-bac-a-e1350001-en
Title: IT Service Support Team Leader - Trưởng Nhóm Hỗ Trợ Dịch Vụ IT<span class="new">(New)</span> have Href: https://www.vietnamworks.com/it-service-support-team-leader-truong-nhom-ho-tro-dich-vu-it-1-1437975-jd/?source=searchResults&searchType=2&placement=1437976&sortBy=date
Title: Công Ty Cổ Phần Viễn Thông Di Động Vietnamobile have Href: https://www.vietnamworks.com/jobs-at-cong-ty-co-phan-vien-thong-di-dong-vietnamobile-e356273-en
  • Related