Home > Software engineering >  How to add each item to a list in a for loop in BeautifulSoup4
How to add each item to a list in a for loop in BeautifulSoup4

Time:09-27

Hi I have written a script which uses BeautifulSoup4 to extract list of jobs as well as their details and associated application links. I have used a for loop as each value (Link/Title/Company etc) as each piece of information is under a different class.

I have managed to write for loops to extract all of the data however not sure how to link the first result in the 1st for loop (Link) to pair with the 1st result in the second for loop (Job Title) and so on.

So my output is currently:

(There are 50 jobs on the search)

First 50 lines : Links of the application Second 50 lines : Names of each job title etc etc.

import requests
import json
from bs4 import BeautifulSoup

URL = "https://remote.co/remote-jobs/developer/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

jobs = soup.find_all('a', class_='card m-0 border-left-0 border-right-0 border-top-0 border-bottom')
titles = soup.find_all('span', class_='font-weight-bold larger')
date_added = soup.find_all('span', class_='float-right d-none d-md-inline text-secondary')
company = soup.find_all('p', class_='m-0 text-secondary')


remote = 'https://remote.co/'

job_list = []

for a in jobs:
    link = a['href']
    print(f'Apply here: {remote}{link}')
    job_list.append(link)

for b in titles:
    job_list.append(b.text)

for c in date_added:
    job_list(c.text)

for d in company:
    job_list(d.text)

Here's the code I have written, can someone help me with organising it so that the first chunk of text will be

Link to Apply

Job Title

Date the Job was Added

Name of Company and Working Hours

Here is a snippet of the HTML from the site

<div >
          <div >
            <div >
              <h2  style="-webkit-box-flex:0;flex-grow:0;">Remote Developer Jobs</h2><div style="background:#00a2e1;-webkit-box-flex:1;flex-grow:1;height:3px;"></div>
            </div>
            <div >
              <div >

                  <p >
                    <a href="/remote-jobs/" style="font-size:18px;">
                      <em>
                                                  See all Remote Jobs >
                                              </em>
                    </a>
                  </p>
                  
                                                              <a href="/job/staff-frontend-web-developer-24/" >
                        
                          <div >
                            <div >
                              <div >
                                <img src="data:image/svg xml," alt="Routable"  data-lazy-src="https://remoteco.s3.amazonaws.com/wp-content/uploads/2021/07/27194326/routable-150x150.png"/><noscript><img src="https://remoteco.s3.amazonaws.com/wp-content/uploads/2021/07/27194326/routable-150x150.png" alt="Routable" /></noscript>
                              </div>
                              <div >
                                <div >
                                  <p ><span >Staff Frontend Web Developer</span><span ><small><date>1 day ago</date></small></span></p>
                                    <p >
                                      Routable 
                                                   
                                                                
                                                                                  &nbsp;|&nbsp;<span ><small>Full-time</small></span>
                                                                                  &nbsp;|&nbsp;<span ><small>International</small></span>
                                                                                                                  </p>
                                </div>
                              </div>
                            </div>
                          
                        </div>    
                      </a>

CodePudding user response:

You can try the next example:

from bs4 import BeautifulSoup
import requests

page = requests.get('https://remote.co/remote-jobs/developer')
soup = BeautifulSoup(page.content,'lxml')

data = []
for e in soup.select('div.card-body.p-0 > a'):
    soup2 = BeautifulSoup(requests.get('https://remote.co' e.get('href')).content,'lxml')
    
    d = {
        'title':soup2.h1.text,
        'job_name':soup2.select_one('div.job_description > p').text,
        'company':soup2.select_one('div.co_name > strong').text,
        'date':soup2.select_one('.date_sm time').text.replace('Posted:',''),
        'Link':'https://remote.co' e.get('href')
        }
    
    data.append(d)

print(data)

Output:

[{'title': 'Principal Software Engineer at Wisetack', 'job_name': 'Principal Software Engineer', 'company': 'Wisetack', 'date': ' 2 hours ago', 'Link': 'https://remote.co/job/principal-software-engineer-26/'}, {'title': 'Staff Frontend Web Developer at Routable', 'job_name': 'Staff Frontend Web Developer', 'company': 'Routable', 'date': ' 1 day ago', 'Link': 'https://remote.co/job/staff-frontend-web-developer-24/'}, {'title': 'Developer Advocate at DeepSource', 'job_name': 'Developer Advocate', 'company': 'DeepSource', 'date': ' 2 days ago', 'Link': 'https://remote.co/job/developer-advocate-24/'}, {'title': 'Senior GCP DevOps Engineer at RXMG', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'RXMG', 'date': ' 3 days ago', 'Link': 'https://remote.co/job/senior-gcp-devops-engineer-23/'}, {'title': 'Growth Engineer, MarTech at Facet Wealth', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'Facet Wealth', 'date': ' 3 days ago', 'Link': 'https://remote.co/job/growth-engineer-martech-23/'}, {'title': 'DevOps Engineer at Oddball', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'Oddball', 'date': ' 3 days ago', 'Link': 'https://remote.co/job/devops-engineer-66/'}, {'title': 'DevOps Engineer at Paymentology', 'job_name': 'Location:\xa0 International, Anywhere; 100% remote', 'company': 'Paymentology', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/devops-engineer-67/'}, {'title': 'Director, Core Technology Software Development at Andela', 'job_name': 'Title: Director, Core Technology Software Development', 'company': 'Andela', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/director-core-technology-software-development-22/'}, {'title': 'Senior Developer – Net Core/C#/SQL (REMOTE or Local) at Cascade Financial Technology', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'Cascade Financial Technology', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/senior-developer-net-core-c-sql-remote-or-local-22/'}, {'title': 'Front End Android Developer at Cascade Financial Technology', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'Cascade Financial Technology', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/front-end-android-developer-22/'}, {'title': 'Senior Backend Engineer – Python at Doist', 'job_name': 'Senior Backend Engineer (Python)', 'company': 'Doist', 'date': ' 5 days ago', 'Link': 'https://remote.co/job/senior-backend-engineer-python-21/'}, {'title': "Front End Developer at Brad's Deals", 'job_name': 'Front End Developer', 'company': "Brad's Deals", 'date': ' 5 days ago', 'Link': 'https://remote.co/job/front-end-developer-21-2/'}, {'title': 'Director of Engineering at Farmgirl Flowers', 'job_name': 'Director of Engineering', 'company': 'Farmgirl Flowers', 'date': ' 5 days ago', 'Link': 'https://remote.co/job/director-of-engineering-21/'}, {'title': 'Software Engineer, Backend Identity at Affirm', 'job_name': 'Title: Software Engineer, Backend (Identity)', 'company': 'Affirm', 'date': ' 5 days ago', 'Link': 'https://remote.co/job/software-engineer-backend-identity-21/'}, {'title': 'Backend Developer (Node/Typescript) at CitizenShipper', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'CitizenShipper', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/backend-developer-node-typescript-20/'}, {'title': 'Fullstack Developer (TypeScript) at CitizenShipper', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'CitizenShipper', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/fullstack-developer-typescript-20/'}, {'title': 'Senior Software Engineer- Java at Method, Inc.', 'job_name': 'Location:\xa0 US Locations; 100% Remote', 'company': 'Method, Inc.', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/senior-software-engineer-java-2/'}, {'title': 'Senior Software Engineer – Backend at Varsity Tutors', 'job_name': 'Title:\xa0Senior Software Engineer (Backend) – Golang', 'company': 'Varsity 
Tutors', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/senior-software-engineer-backend-20/'}, {'title': 'Backend Engineer, Growth Engineering at Stripe, Inc.', 'job_name': 'Backend Engineer, Growth Engineering', 'company': 
'Stripe, Inc.', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/backend-engineer-growth-engineering-20/'}, {'title': 'Game Developer at Voodoo', 'job_name': 'Game Developer', 'company': 'Voodoo', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/game-developer-20/'}, {'title': 'Senior Ruby Engineer at Clearcover', 'job_name': 'Title: Sr. Ruby Engineer', 'company': 'Clearcover', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/senior-ruby-engineer-18/'}, {'title': 'Ruby Engineer at Clearcover', 'job_name': 'Title: Ruby Engineer', 'company': 'Clearcover', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/ruby-engineer-17/'}, {'title': 'DevOps Engineer at OCCRP', 'job_name': 'Location:\xa0 International, Anywhere; Freelance', 'company': 'OCCRP', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/devops-engineer-65/'}, {'title': 'Python Developer at ScienceLogic', 'job_name': 'Title:\xa0Python Developer', 'company': 'ScienceLogic', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/python-developer-16/'}, {'title': 'Senior Software Engineer – App Stores Backend at Canonical', 'job_name': 'Title:\xa0Senior Software Engineer – App Stores Backend (Remote)', 'company': 'Canonical', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/senior-software-engineer-app-stores-backend-16/'}, {'title': 'Software Engineer, Backend – Machine Learning Platform at 
Affirm', 'job_name': 'Software Engineer, Backend (Machine Learning Platform)', 'company': 'Affirm', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/software-engineer-backend-machine-learning-platform-14/'}, {'title': 'Senior 
Engineering Manager, Billing at Webflow', 'job_name': 'Title: Senior Engineering Manager, Billing', 'company': 'Webflow', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/senior-engineering-manager-billing-14/'}, {'title': 'Senior Software Engineer, Anti-Tracking at Mozilla', 'job_name': 'Title: Senior Software Engineer, Anti-Tracking', 'company': 'Mozilla', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-anti-tracking-14/'}, {'title': 'Director of Engineering at Conserv', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'Conserv', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/director-of-engineering-14/'}, {'title': 'Lead Front End Developer- Email at Stitch Fix', 'job_name': 'Title:\xa0Lead Front End Developer- Email', 'company': 'Stitch Fix', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/lead-front-end-developer-email-13/'}, {'title': 'Technical Lead Growth Monetization, Frontend at HubSpot', 'job_name': 'Technical Lead Growth Monetization, Frontend (US/Remote)', 'company': 'HubSpot', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/technical-lead-growth-monetization-frontend-11/'}, {'title': 'Senior Software Engineer, Backend Debit  at Affirm', 'job_name': 'Title:\xa0Senior Software Engineer, Backend\xa0(Debit )', 'company': 'Affirm', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-backend-debit-11/'}, {'title': 'C   Graphics and Windowing System Software Engineer at Canonical', 'job_name': 'Title:\xa0C   Graphics and Windowing System Software Engineer\xa0– Mir', 'company': 'Canonical', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/c-graphics-and-windowing-system-software-engineer-9/'}, {'title': 'Senior Manager, Software Engineering at Myriad Genetics', 'job_name': 'Title:\xa0Senior Manager, Software Engineering', 'company': 'Myriad Genetics', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-manager-software-engineering-8/'}, {'title': 'Senior Kernel Build Automation Engineer at Canonical', 'job_name': 'Title: Senior Kernel Build Automation Engineer ', 'company': 'Canonical', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-kernel-build-automation-engineer-8/'}, {'title': 'Engineering Manager – Full Stack at Betterment', 'job_name': 'Title: Engineering Manager – Full Stack', 'company': 'Betterment', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/engineering-manager-full-stack-7/'}, {'title': 'Principal Architect – Software Engineering at Citizens Bank', 'job_name': 'Principal Architect – Software Engineering', 'company': 'Citizens Bank', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/principal-architect-software-engineering-7/'}, {'title': 'Senior Software Engineer, Kubernetes Platform at Appboy', 'job_name': 'Title:\xa0Senior Software Engineer, Kubernetes Platform', 'company': 'Appboy', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-kubernetes-platform-7/'}, {'title': 'Senior React Native Developer at Toptal', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-react-native-developer-11/'}, {'title': 'Senior Blockchain Developer at Toptal', 'job_name': 'Location: International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-blockchain-developer-5/'}, {'title': 'Front-End Developer at Toptal', 'job_name': 'Location: International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/front-end-developer-5-2/'}, {'title': 'Senior DevOps Engineer at Toptal', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-devops-engineer-11-2/'}, {'title': 'Senior React Developer at Toptal', 'job_name': 'Location: Anywhere, International;\xa0 Freelance;\xa0 100% Remote', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-react-developer-5/'}, {'title': 'Full-Stack Developer at Toptal', 'job_name': 'Location: International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/full-stack-developer-5-2/'}, {'title': 'Senior Full Stack Developer: Long-term job – 100% remote at Proxify AB', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote; Freelance', 'company': 'Proxify AB', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-full-stack-developer-long-term-job-100-remote-6/'}, {'title': 'Software Engineer – Backend at 0x', 'job_name': 'Software Engineer – Backend (Campus)', 'company': '0x', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/software-engineer-backend-5-2/'}, {'title': 'Engineering Manager at Array.com', 'job_name': 'Engineering Manager', 'company': 'Array.com', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/engineering-manager-5-2/'}, {'title': 'Senior Software Engineer, Canvas Facilitation at MURAL.co', 'job_name': 'Senior Software Engineer, Canvas Facilitation', 'company': 'MURAL.co', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-canvas-facilitation-5/'}, {'title': 'Backend Engineer at CareRev', 'job_name': 'Title:\xa0Backend Engineer', 'company': 'CareRev', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/backend-engineer-5-2/'}, {'title': 'Principal Software Engineer, Architect Cognitive Automation at Appian', 'job_name': 'Title:\xa0Principal Software Engineer/Architect (Cognitive Automation)', 'company': 'Appian', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/principal-software-engineer-architect-cognitive-automation-5/'}]

CodePudding user response:

Your lists have a little bit of unnecessary data at the moment. Can you provide an example of how it is supposed to look in the end?

However, can use zip() to iterate over all lists at the same time:

jobs = soup.find_all('a', class_='card m-0 border-left-0 border-right-0 border-top-0 border-bottom')
titles = soup.find_all('span', class_='font-weight-bold larger')
dates_added = soup.find_all('span', class_='float-right d-none d-md-inline text-secondary')
companies = soup.find_all('p', class_='m-0 text-secondary')

for job, title, date_added, company in zip(jobs, titles, dates_added, companies):
    print(job, title, date_added, company)
  • Related