I'm building a fully automated get-a-job application, funny enough the automation portion is fairly simple, however the scrapping not so much.
In short, requests
beautifulsoup
has worked for the majority of domains I am scrapping, however nothing works when trying the same process on workable pages:
import requests
from bs4 import BeautifulSoup as bs
session = requests.Session()
url = 'https://apply.workable.com/breederdao-1/j/602097ACC9/'
req = session.get(url)
title = soup.find('h1', {'data-ui': 'job-title'})
print(title)
>>> None
details = soup.find('span', {'data-ui': 'job-location'})
print(details)
>>> None
Both elements are under body
, however when I try to fetch the page's title I do get what I expect:
title_0 = soup.find('title')
print(title_0)
>>> <title>Data Analyst (Fully Remote) - BreederDAO</title>
I tried using await
HTMLSEssion
/ AsyncHTMLSession
as well, but so long as the element is inside of body
, every find()
still returns None
.
Can anyone educate me on this? My current hypothesis is that the website has some kind of anti-scrapping mechanism, but I have zero idea where to even start looking. This element does look extra sus though:
<html...
<head>...</head>
<body>
.
.
.
<noscript>
<iframe height="0" width="0" src="https://www.googletagmanager.com/ns.html?id=GTM-WKS7WTT&gtm_auth=SGnzIn3pcB7S4fevFXOKPQ&gtm_preview=env-2&gtm_cookies_win=x" style="display: none; visibility: hidden;">
#document
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>ns</title>
</head>
<body>
" "
</body>
</html>
</iframe>
</noscript>
.
.
.
</body>
</html>
CodePudding user response:
The data you see is loaded from external URL via javascript. To load it you can use requests
module. For example:
import json
import requests
# 602097ACC9 is from your URL
url = "https://apply.workable.com/api/v2/accounts/breederdao-1/jobs/602097ACC9"
data = requests.get(url).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
print(data["title"])
print(", ".join(data["location"].values()))
Prints:
Data Analyst (Fully Remote)
Philippines, PH, Makati, Metro Manila