Home > Enterprise >  How to use BeautifulSoup to find data of a class within a class
How to use BeautifulSoup to find data of a class within a class

Time:09-22

Hi I am trying to test my knowledge on BeautifulSoup on the website https://pythonjobs.github.io/.

I want to be able to have each listing printed out with their job roles, location, company etc.

import requests
import json
from bs4 import BeautifulSoup

URL = "https://pythonjobs.github.io/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

job = soup.find(id='container')
job_elements = job.find_all(class_='job')


for job_element in job_elements:
    location = job_element.find(class_='i-globe')
    date = job_element.find('i', class_='calendar')
    length = job_element.find('i', class_='i-chair')
    company_name = job_element.find('i', class_='i-company')
    description = job_element.find('p', class_='detail')

This is the code I have but it is all returning None.

For reference here is a snippet of the HTML on the website for 1 job listing. I have found that job_element.find('info') returns all the information but there's no way to isolate each bit such as company, location etc. How do I do this? Thanks

<section >
            <div  data-order="0" data-slug="datadog-open-source-software-engineer-python" data-tags="python,django,flask,falcon,celery">
            <a 
                href="/jobs/datadog-open-source-software-engineer-python.html">
                Read more <i ></i>
            </a>
                    <h1><a href="/jobs/datadog-open-source-software-engineer-python.html">Open Source Software Engineer - Python</a></h1>

    <span ><i ></i> New York City or Remote</span>
    <span ><i ></i> Thu, 03 Jun 2021</span>
    <span ><i ></i> permanent</span>
    <span ><i ></i> Datadog</span>

CodePudding user response:

Cause texts are not in the <i> you should use .next or .next_sibling to get them, also check your selections there is a class class_='i-calendar' not class_='calendar':

jobs=[]
for job_element in job_elements:
    jobs.append({
        'location': job_element.find(class_='i-globe').next,
        'date': job_element.find('i', class_='i-calendar').next,
        'length': job_element.find('i', class_='i-chair').next,
        'company_name': job_element.find('i', class_='i-company').next,
        'description': job_element.find('p', class_='detail').text 
    })
    
jobs

Output

[{'location': ' New York City or Remote',
  'date': ' Thu, 03 Jun 2021',
  'length': ' permanent',
  'company_name': ' Datadog',
  'description': ' The\xa0Role In this role on our APM (tracing/profiling/debugging) team you will: Write open source code that instruments thousands of Python applications around the world. Drive our open source Python projects and...'},
 {'location': ' remote',
  'date': ' Sun, 11 Apr 2021',
  'length': ' permanent, part-time possible',
  'company_name': ' RealRate GmbH',
  'description': ' RealRate is Hiring Senior Python\xa0Developers! RealRate, the Artificial Intelligence rating agency is growing. We’re looking for a senior Python\xa0developer: More than 8 years of project\xa0experience. Python\xa0senior. Data...'},...]

CodePudding user response:

import requests
import json
from bs4 import BeautifulSoup

URL = "https://pythonjobs.github.io/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

job = soup.find(id='container')
job_elements = job.find_all(class_='job')


for job_element in job_elements:
    location = job_element.find('i', class_='i-globe')
    date = job_element.find('i', class_='i-calendar')
    length = job_element.find('i', class_='i-chair')
    company_name = job_element.find('i', class_='i-company')
    description = job_element.find('p', class_='detail') # here you would need to do description.text
    print(location.next.text) # to get the text for of the span for the <i> elements
  • Related