I am using beautiful soup and below is my selector to scrape href.
html = ''' <a data-testid="Link"
href="https://join.com/companies/talpasolutions/4978529-project-customer-
success-manager-heavy-industries-d-f-m">'''
soup = beautifulsoup(HTML , "lxml")
jobs = soup.find_all( "a" ,class_= "sc-pciXn eUevWj JobTile___StyledJobLink-sc-1nulpkp-0
gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP")
for job in jobs:
job_url = job.get("href")
I am using find_all because there is a total of 3 elements with hrefs.
Above method is working but the website keeps changing the classes on a daily basis. I need a different way to design CSS/XPath
CodePudding user response:
Try:
import requests
from bs4 import BeautifulSoup
url = "https://join.com/companies/talpasolutions"
soup = BeautifulSoup(requests.get(url).content, "lxml")
for a in soup.select("a:has(h3)"):
print(a.get("href"))
Prints:
https://join.com/companies/talpasolutions/4978529-project-customer-success-manager-heavy-industries-d-f-m
https://join.com/companies/talpasolutions/4925936-senior-data-engineer-d-f-m
https://join.com/companies/talpasolutions/4926107-senior-data-scientist-d-f-m