I'm trying to extract data from job search website using BeautifulSoup. I've been able to extract all of the data I need but the salary displayed.
The webpage is https://mx.indeed.com/jobs?q=operador&l=Ciudad de México
The problem I have is that the salary is inside a <span>
without class name or title.
The sample html code looks like:
<div class="heading6 tapItem-gutter metadataContainer"><div class="metadata salary-snippet-container"><div aria-label="$12,000 al mes" class="salary-snippet"><span>$12,000 al mes</span></div></div></div>
I tried:
salary = card.find("div", {"class" : "salary-snippet"}).find("span").text
But I get the following error:
AttributeError: 'NoneType' object has no attribute 'find'
Can anyone please explain how I can possible fix this?
CodePudding user response:
What happens?
The sample looks perfect but if you take a closer look, there is not an element of salary in all cards.
How to fix?
Just check if element is present bevor calling text on it:
salary = card.select_one('div.salary-snippet').text if card.select_one('div.salary-snippet') else None
Example
import requests
from bs4 import BeautifulSoup
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://mx.indeed.com/trabajo?q=operador&l=Ciudad de México&vjk=970d586d3023d4d0')
soup=BeautifulSoup(r.content, 'lxml')
data = []
for card in soup.select('#mosaic-provider-jobcards a'):
companyName = card.select_one('span.companyName').text if card.select_one('span.companyName') else None
companyLocation = card.select_one('div.companyLocation').text if card.select_one('div.companyLocation') else None
salary = card.select_one('div.salary-snippet').text if card.select_one('div.salary-snippet') else None
data.append({
'companyName':companyName,
'companyLocation':companyLocation,
'salary':salary
})
data
Just wanna add jobs with salary?
data = []
for card in soup.select('#mosaic-provider-jobcards a'):
companyName = card.select_one('span.companyName').text if card.select_one('span.companyName') else None
companyLocation = card.select_one('div.companyLocation').text if card.select_one('div.companyLocation') else None
salary = card.select_one('div.salary-snippet').text if card.select_one('div.salary-snippet') else None
if salary:
data.append({
'companyName':companyName,
'companyLocation':companyLocation,
'salary':salary
})
data