Home > Back-end >  python selenium webscraping (clicking buttons which shows data and then extracting it)
python selenium webscraping (clicking buttons which shows data and then extracting it)

Time:11-05

so what I'm trying to do is: https://www.jobbank.gc.ca/jobsearch/jobsearch?sort=D&fsrc=16&fbclid=IwAR2SIG3lbY1S9lO4WilcKw6TxJAJQbFIGYTVE_tOTqYRpb43qM3uYgLWV64, < in this link open all listings and then when it redirects to another page there is a button ( Show how to apply ) when we click on that button there will be shown an email address. So I want to to scrape every job listing title and email address through my code. I already scraped titles and hrefs but have no idea what to do next(e.g clicking on every job listing, then clicking to "Show how to apply" and scraping emails from there). I hope you guys understand what I want to do ( Sorry for my english )

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import os
s = Service('C:\Program Files (x86)\chromedriver.exe')
driver = webdriver.Chrome(service=s)
driver.get('https://www.jobbank.gc.ca/jobsearch/jobsearch?sort=D&fsrc=16&fbclid=IwAR2SIG3lbY1S9lO4WilcKw6TxJAJQbFIGYTVE_tOTqYRpb43qM3uYgLWV64')

# Get titles of Job listings
elements = []
for element in driver.find_elements(By.CLASS_NAME, 'resultJobItem'):
    title = element.find_element(By.XPATH, './/*[@]').text
    if title not in elements:
        elements.append({'Title': title.split('\n')})

# Get all href
link = driver.find_elements(By.XPATH, './/*[@]/article/a')
for links in link:
    elements.append({'Link': links.get_attribute('href')})

print(elements)

CodePudding user response:

Looks like you can use their own api with a post request to get the data.

You'll need to scrape the job id.

so for the job on this url: https://www.jobbank.gc.ca/jobsearch/jobposting/35213663 i see that the job id is 1860693. so ill need to post a request like this.

import requests
from bs4 import BeautifulSoup as BS

url = "https://www.jobbank.gc.ca/jobsearch/jobposting/35213663"  
jobid = "1860693"
data = {
  'seekeractivity:jobid': f'{jobid}',
  'seekeractivity_SUBMIT': '1',
  'javax.faces.ViewState': 'stateless',
  'javax.faces.behavior.event': 'action',
  'jbfeJobId': f'{jobid}',
  'action': 'applynowbutton',
  'javax.faces.partial.event': 'click',
  'javax.faces.source': 'seekeractivity',
  'javax.faces.partial.ajax': 'true',
  'javax.faces.partial.execute': 'jobid',
  'javax.faces.partial.render': 'applynow',
  'seekeractivity': 'seekeractivity'
}

response = requests.post(url, data)

soup = BS(response.text)
email = soup.a.text
print(email)
this gives me
>> [email protected]

CodePudding user response:

I would store all the links seperately.
So assume the following variable all_links contains all the links. Now,

.
.
.
driver.quit()

link1 = all_links[0] # lets take the example of the first link. youd have to for loop through all the link; for link in links

new_driver = webdriver.Chrome(service=s)
new_driver.get(link1)

new_driver.find_element_by_css_selector("#applynowbutton").click()

At this point the 'Show how to Apply' button has been clicked.

Unfortunately, I dont know too much about html and all but essentially at this point you can extract the email much like you extracted all the links previously

CodePudding user response:

Try like below:

Can apply scrollIntoView to the particular job option. When it reaches the end, click on Show more option and continue extracting details.

driver.get("https://www.jobbank.gc.ca/jobsearch/jobsearch?sort=D&fsrc=16&fbclid=IwAR2SIG3lbY1S9lO4WilcKw6TxJAJQbFIGYTVE_tOTqYRpb43qM3uYgLWV64")

i = 0
while True:
    try:
        jobs = driver.find_elements_by_xpath("//div[@class='results-jobs']/article")
        driver.execute_script("arguments[0].scrollIntoView(true);",jobs[i])
        title = jobs[i].find_element_by_xpath(".//span[@class='noctitle']").text
        link = jobs[i].find_element_by_tag_name("a").get_attribute("href")
        print(f"{i 1} - {title} : {link}")
        i =1
        if i == 100:
            break
    except IndexError:
        driver.find_element_by_id("moreresultbutton").click()
        time.sleep(3)
  • Related