Hi I am trying to test my knowledge on BeautifulSoup on the website https://pythonjobs.github.io/.
I want to be able to have each listing printed out with their job roles, location, company etc.
import requests
import json
from bs4 import BeautifulSoup
URL = "https://pythonjobs.github.io/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
job = soup.find(id='container')
job_elements = job.find_all(class_='job')
for job_element in job_elements:
location = job_element.find(class_='i-globe')
date = job_element.find('i', class_='calendar')
length = job_element.find('i', class_='i-chair')
company_name = job_element.find('i', class_='i-company')
description = job_element.find('p', class_='detail')
This is the code I have but it is all returning None.
For reference here is a snippet of the HTML on the website for 1 job listing. I have found that job_element.find('info') returns all the information but there's no way to isolate each bit such as company, location etc. How do I do this? Thanks
<section >
<div data-order="0" data-slug="datadog-open-source-software-engineer-python" data-tags="python,django,flask,falcon,celery">
<a
href="/jobs/datadog-open-source-software-engineer-python.html">
Read more <i ></i>
</a>
<h1><a href="/jobs/datadog-open-source-software-engineer-python.html">Open Source Software Engineer - Python</a></h1>
<span ><i ></i> New York City or Remote</span>
<span ><i ></i> Thu, 03 Jun 2021</span>
<span ><i ></i> permanent</span>
<span ><i ></i> Datadog</span>
CodePudding user response:
Cause texts are not in the <i>
you should use .next
or .next_sibling
to get them, also check your selections there is a class
class_='i-calendar' not class_='calendar':
jobs=[]
for job_element in job_elements:
jobs.append({
'location': job_element.find(class_='i-globe').next,
'date': job_element.find('i', class_='i-calendar').next,
'length': job_element.find('i', class_='i-chair').next,
'company_name': job_element.find('i', class_='i-company').next,
'description': job_element.find('p', class_='detail').text
})
jobs
Output
[{'location': ' New York City or Remote',
'date': ' Thu, 03 Jun 2021',
'length': ' permanent',
'company_name': ' Datadog',
'description': ' The\xa0Role In this role on our APM (tracing/profiling/debugging) team you will: Write open source code that instruments thousands of Python applications around the world. Drive our open source Python projects and...'},
{'location': ' remote',
'date': ' Sun, 11 Apr 2021',
'length': ' permanent, part-time possible',
'company_name': ' RealRate GmbH',
'description': ' RealRate is Hiring Senior Python\xa0Developers! RealRate, the Artificial Intelligence rating agency is growing. We’re looking for a senior Python\xa0developer: More than 8 years of project\xa0experience. Python\xa0senior. Data...'},...]
CodePudding user response:
import requests
import json
from bs4 import BeautifulSoup
URL = "https://pythonjobs.github.io/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
job = soup.find(id='container')
job_elements = job.find_all(class_='job')
for job_element in job_elements:
location = job_element.find('i', class_='i-globe')
date = job_element.find('i', class_='i-calendar')
length = job_element.find('i', class_='i-chair')
company_name = job_element.find('i', class_='i-company')
description = job_element.find('p', class_='detail') # here you would need to do description.text
print(location.next.text) # to get the text for of the span for the <i> elements