so what I'm trying to do is: https://www.jobbank.gc.ca/jobsearch/jobsearch?sort=D&fsrc=16&fbclid=IwAR2SIG3lbY1S9lO4WilcKw6TxJAJQbFIGYTVE_tOTqYRpb43qM3uYgLWV64, < in this link open all listings and then when it redirects to another page there is a button ( Show how to apply ) when we click on that button there will be shown an email address. So I want to to scrape every job listing title and email address through my code. I already scraped titles and hrefs but have no idea what to do next(e.g clicking on every job listing, then clicking to "Show how to apply" and scraping emails from there). I hope you guys understand what I want to do ( Sorry for my english )
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import os
s = Service('C:\Program Files (x86)\chromedriver.exe')
driver = webdriver.Chrome(service=s)
driver.get('https://www.jobbank.gc.ca/jobsearch/jobsearch?sort=D&fsrc=16&fbclid=IwAR2SIG3lbY1S9lO4WilcKw6TxJAJQbFIGYTVE_tOTqYRpb43qM3uYgLWV64')
# Get titles of Job listings
elements = []
for element in driver.find_elements(By.CLASS_NAME, 'resultJobItem'):
title = element.find_element(By.XPATH, './/*[@]').text
if title not in elements:
elements.append({'Title': title.split('\n')})
# Get all href
link = driver.find_elements(By.XPATH, './/*[@]/article/a')
for links in link:
elements.append({'Link': links.get_attribute('href')})
print(elements)
CodePudding user response:
Looks like you can use their own api with a post request to get the data.
You'll need to scrape the job id.
so for the job on this url: https://www.jobbank.gc.ca/jobsearch/jobposting/35213663 i see that the job id is 1860693. so ill need to post a request like this.
import requests
from bs4 import BeautifulSoup as BS
url = "https://www.jobbank.gc.ca/jobsearch/jobposting/35213663"
jobid = "1860693"
data = {
'seekeractivity:jobid': f'{jobid}',
'seekeractivity_SUBMIT': '1',
'javax.faces.ViewState': 'stateless',
'javax.faces.behavior.event': 'action',
'jbfeJobId': f'{jobid}',
'action': 'applynowbutton',
'javax.faces.partial.event': 'click',
'javax.faces.source': 'seekeractivity',
'javax.faces.partial.ajax': 'true',
'javax.faces.partial.execute': 'jobid',
'javax.faces.partial.render': 'applynow',
'seekeractivity': 'seekeractivity'
}
response = requests.post(url, data)
soup = BS(response.text)
email = soup.a.text
print(email)
this gives me
>> [email protected]
CodePudding user response:
I would store all the links seperately.
So assume the following variable all_links
contains all the links. Now,
.
.
.
driver.quit()
link1 = all_links[0] # lets take the example of the first link. youd have to for loop through all the link; for link in links
new_driver = webdriver.Chrome(service=s)
new_driver.get(link1)
new_driver.find_element_by_css_selector("#applynowbutton").click()
At this point the 'Show how to Apply' button has been clicked.
Unfortunately, I dont know too much about html and all but essentially at this point you can extract the email much like you extracted all the links previously
CodePudding user response:
Try like below:
Can apply scrollIntoView
to the particular job option. When it reaches the end, click on Show more
option and continue extracting details.
driver.get("https://www.jobbank.gc.ca/jobsearch/jobsearch?sort=D&fsrc=16&fbclid=IwAR2SIG3lbY1S9lO4WilcKw6TxJAJQbFIGYTVE_tOTqYRpb43qM3uYgLWV64")
i = 0
while True:
try:
jobs = driver.find_elements_by_xpath("//div[@class='results-jobs']/article")
driver.execute_script("arguments[0].scrollIntoView(true);",jobs[i])
title = jobs[i].find_element_by_xpath(".//span[@class='noctitle']").text
link = jobs[i].find_element_by_tag_name("a").get_attribute("href")
print(f"{i 1} - {title} : {link}")
i =1
if i == 100:
break
except IndexError:
driver.find_element_by_id("moreresultbutton").click()
time.sleep(3)