I recently started my very first Data Science project. I want to analyze specific job offers and therefore need to gather some data from a job portal.
Unfortunately I am already stuck at the very beginning. I seem to have some troubles with looping trough pages. I know there are already similar questions but none of the answers seems to help me (or maybe I simply do not understand them)
When scraping a single page I get exactly the result I am looking for
e.g.
Firma: Greiner AG , Job: Controller (m/w/d) , Arbeitsort: Sattledt , Online seit 8.2.2022
but as soon as I try to loop through pages I get an error message
Traceback (most recent call last): File "e:\Programmieren\Projects\Webscraping\laola1_scraper.py", line 18, in job_title = jobs.find('h2', class_ = 'm-jobsListItem__title').text AttributeError: 'NoneType' object has no attribute 'text'
I also already tested to start with page 2 --> in that case I get a result 5 lines as intended and after that there is the error message again
I checked the position of the website where my code breaks but there is for sure no change in structure like mentioned in other cases
sitting here for almost 3 hours now but can't find any solution - guess it's pretty simple but what do i miss?
import requests
from bs4 import BeautifulSoup as bs
url_job = "https://www.karriere.at/jobs/controller-controlling/oberösterreich-zentralraum"
#response = requests.get(url_job)
for page in range(2,10):
response = requests.get(url_job "?page=" str(page))
data = bs(response.content, 'lxml')
job = data.find_all('li', class_ = 'm-jobsList__item')
for jobs in job:
job_title = jobs.find('h2', class_ = 'm-jobsListItem__title').text
job_company = jobs.find('div', class_ ='m-jobsListItem__company').text
job_location = jobs.find('li', class_ ='m-jobsListItem__location').text
job_date = jobs.find('span', class_ ='m-jobsListItem__date').text.replace("am","")
print(f'''
Firma:{job_company}, Job:{job_title}, Arbeitsort:{job_location}, Online seit{job_date}
''')
Thanks in advance
best, bones
CodePudding user response:
Your code is almost ok, but you want to skip specific items (e.g. ads) which don't contain job offer:
import requests
from bs4 import BeautifulSoup as bs
url_job = "https://www.karriere.at/jobs/controller-controlling/oberösterreich-zentralraum"
for page in range(10):
response = requests.get(url_job "?page=" str(page))
data = bs(response.content, "lxml")
job = data.find_all("li", class_="m-jobsList__item")
for jobs in job:
# skip specific classes:
if jobs.select_one(".m-brandingSolutionAdCard, .m-alarmDisruptor"):
continue
job_title = jobs.find("h2", class_="m-jobsListItem__title").text
job_company = jobs.find("div", class_="m-jobsListItem__company").text
job_location = jobs.find("li", class_="m-jobsListItem__location").text
job_date = jobs.find(
"span", class_="m-jobsListItem__date"
).text.replace("am", "")
print(
f"""Firma:{job_company}, Job:{job_title}, Arbeitsort:{job_location}, Online seit{job_date}"""
)
Prints:
Firma: Oberbank AG , Job: MitarbeiterIn Kostenmanagement (m/w/d) , Arbeitsort: Linz , Online seit 5.2.2022
Firma: TOURISMUSVERBAND Region WELS , Job: MitarbeiterIn Buchhaltung und Kostenrechnung (Teilzeit bis 20 h) , Arbeitsort: Wels , Online seit 8.2.2022
Firma: Oberbank AG , Job: Junior Controller (m/w/d) - Karrierechance für BerufseinsteigerInnen , Arbeitsort: Linz , Online seit 8.2.2022
Firma: WIFI Oberösterreich , Job: Controlling/Kostenrechnung , Arbeitsort: Linz , Online seit 7.2.2022
Firma: Schlüsselbauer Technology GmbH & Co KG , Job: Kostenrechner - Controller (m/w/d) , Arbeitsort: Gaspoltshofen , Online seit 1.2.2022
Firma: ISG Personalmanagement GmbH , Job: Controller - Schwerpunkt HR (m/w/d) , Arbeitsort: Linz , Online seit 9.2.2022
Firma: ISG Personalmanagement GmbH , Job: Financial Controller (m/w/d) , Arbeitsort: Linz , Online seit 5.2.2022
Firma: Schulmeister Finance , Job: (Junior) Controller mit ausgezeichneten Entwicklungsmöglichkeiten (m/w/d) , Arbeitsort: Wels , Online seit 9.2.2022
Firma: Schulmeister Finance , Job: Controller (m/w/d) für Non Profit Organisation , Arbeitsort: Linz , Online seit 2.2.2022
Firma: VACE Engineering GmbH , Job: Financial Controller (m/w/d) , Arbeitsort: Linz , Online seit 10.2.2022
Firma: Schulmeister Finance , Job: (Senior) Controller (m/w/d) , Arbeitsort: Linz , Online seit 10.2.2022
Firma: ÖSWAG Maschinenbau GmbH , Job: Controller / Bilanzbuchhalter (m/w/d) , Arbeitsort: Linz , Online seit 10.2.2022
Firma: Maschinenring Personal und Service eGen , Job: (Senior-)Controller/in (m/w/d) , Arbeitsort: Linz , Online seit 10.2.2022
Firma: Schulmeister Finance , Job: Junior-Controller (m/w/d) , Arbeitsort: Linz , Online seit 9.2.2022
Firma: Schulmeister Finance , Job: Senior Controller (m/w/d) für innovatives Geschäftsfeld , Arbeitsort: Linz , Online seit 9.2.2022
Firma: Schulmeister Finance , Job: Serviceorientierter Controller mit Hands-On-Mentalität (m/w/d) , Arbeitsort: Linz , Online seit 9.2.2022
Firma: TGW Logistics Group , Job: Group Controller (m/w/d) , Arbeitsort: Marchtrenk , Online seit 9.2.2022
...and so on.