I am building a quite simple beautifulsoup/requests web scraper, but when running it on a jobs website, the error
AttributeError: 'NoneType' object has no attribute 'find_all'
appears. Here is my code:
import requests
from bs4 import BeautifulSoup
URL = "https://uk.indeed.com/jobs?q&l=Norwich, Norfolk&vjk=139a4549fe3cc48b"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="ResultsContainer")
job_elements = results.find_all("div", class_="resultContent")
python_jobs = results.find_all("h2", string="Python")
for job_element in job_elements:
title_element = job_element.find("h2", class_="jobTitle")
company_element = job_element.find("span", class_="companyName")
location_element = job_element.find("div", class_="companyLocation")
print(title_element)
print(company_element)
print(location_element)
print()
Does anyone know what the issue is?
CodePudding user response:
Check your selector for results
attribute id
should be resultsBody
. The wrong selector causes the error in lines that uses results
, cause None
do not has attributes:
results = soup.find(id="resultsBody")
and also job_elements
it is an td not a div:
job_elements = results.find_all("td", class_="resultContent")
You could also chain the selectors with css selectors
:
job_elements = soup.select('#resultsBody td.resultContent')
Getting only these that contains Python
:
job_elements = soup.select('#resultsBody td.resultContent:has(h2:-soup-contains("Python"))')
Example
import requests
from bs4 import BeautifulSoup
URL = "https://uk.indeed.com/jobs?q&l=Norwich, Norfolk&vjk=139a4549fe3cc48b"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="resultsBody")
job_elements = results.find_all("td", class_="resultContent")
python_jobs = results.find_all("h2", string="Python")
for job_element in job_elements:
title_element = job_element.find("h2", class_="jobTitle")
company_element = job_element.find("span", class_="companyName")
location_element = job_element.find("div", class_="companyLocation")
print(title_element)
print(company_element)
print(location_element)
print()