I'm trying to get all the data from all pages, i used a counter and cast it to take the page number in the url then looped using this counter but always the same result This is my code :
# Scrapping job offers from hello work website
#import libraries
import random
import requests
import csv
from bs4 import BeautifulSoup
from datetime import date
#configure user agent for mozilla browser
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0",
"Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0",
"Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0"
]
random_user_agent= random.choice(user_agents)
headers = {'User-Agent': random_user_agent}
here where i have used my counter:
i=0
for i in range(1,15):
url = 'https://www.hellowork.com/fr-fr/emploi/recherche.html?p=' str(i)
print(url)
page = requests.get(url,headers=headers)
if (page.status_code==200):
soup = BeautifulSoup(page.text,'html.parser')
jobs = soup.findAll('div',class_=' new action crushed hoverable !tw-p-4 md:!tw-p-6 !tw-rounded-2xl')
#config csv
csvfile=open('jobList.csv','w ',newline='')
row_list=[] #to append list of job
try :
writer=csv.writer(csvfile)
writer.writerow(["ID","Job Title","Company Name","Contract type","Location","Publish time","Extract Date"])
for job in jobs:
id = job.get('id')
jobtitle= job.find('h3',class_='!tw-mb-0').a.get_text()
companyname = job.find('span',class_='tw-mr-2').get_text()
contracttype = job.find('span',class_='tw-w-max').get_text()
location = job.find('span',class_='tw-text-ellipsis tw-whitespace-nowrap tw-block tw-overflow-hidden 2xsOld:tw-max-w-[20ch]').get_text()
publishtime = job.find('span',class_='md:tw-mt-0 tw-text-xsOld').get_text()
extractdate = date.today()
row_list=[[id,jobtitle,companyname,contracttype,location,publishtime,extractdate]]
writer.writerows(row_list)
finally:
csvfile.close()
CodePudding user response:
In newer code avoid old syntax findAll()
instead use find_all()
or select()
with css selectors
- For more take a minute to check docs
BeautifulSoup
is not necessary needed here - You could get all and more information directly via api using a mix of requests
and pandas
- Check all available information here:
https://www.hellowork.com/searchoffers/getsearchfacets?p=1
Example
import requests
import pandas as pd
from datetime import datetime
df = pd.concat(
[
pd.json_normalize(
requests.get(f'https://www.hellowork.com/searchoffers/getsearchfacets?p={i}', headers={'user-agent':'bond'}).json(), record_path=['Results']
)[['ContractType','Localisation', 'OfferTitle', 'PublishDate', 'CompanyName']]
for i in range(1,15)
],
ignore_index=True
)
df['extractdate '] = datetime.today().strftime('%Y-%m-%d')
df.to_csv('jobList.csv', index=False)
Output
ContractType | Localisation | OfferTitle | PublishDate | CompanyName | extractdate | |
---|---|---|---|---|---|---|
0 | CDI | Beaurepaire - 85 | Chef Gérant H/F | 2023-01-24T16:35:15.867 | Armonys Restauration - Morbihan | 2023-01-24 |
1 | CDI | Saumur - 49 | Dessinateur Métallerie Débutant H/F | 2023-01-24T16:35:14.677 | G2RH | 2023-01-24 |
2 | Franchise | Villenave-d'Ornon - 33 | Courtier en Travaux de l'Habitat pour Particuliers et Professionnels H/F | 2023-01-24T16:35:13.707 | Elysée Concept | 2023-01-24 |
3 | Franchise | Montpellier - 34 | Courtier en Travaux de l'Habitat pour Particuliers et Professionnels H/F | 2023-01-24T16:35:12.61 | Elysée Concept | 2023-01-24 |
4 | CDD | Monaco | Spécialiste Senior Développement Matières Premières Cosmétique H/F | 2023-01-24T16:35:06.64 | Expectra Monaco | 2023-01-24 |
... | ||||||
275 | CDI | Brétigny-sur-Orge - 91 | Magasinier - Cariste H/F | 2023-01-24T16:20:16.377 | DELPHARM | 2023-01-24 |
276 | CDI | Lille - 59 | Technicien Helpdesk Français - Italien H/F | 2023-01-24T16:20:16.01 | Akkodis | 2023-01-24 |
277 | CDI | Tours - 37 | Conducteur PL H/F | 2023-01-24T16:20:15.197 | Groupe Berto | 2023-01-24 |
278 | Franchise | Nogent-le-Rotrou - 28 | Courtier en Travaux de l'Habitat pour Particuliers et Professionnels H/F | 2023-01-24T16:20:12.29 | Elysée Concept | 2023-01-24 |
279 | CDI | Cholet - 49 | Ingénieur Assurance Qualité H/F | 2023-01-24T16:20:10.837 | Akkodis | 2023-01-24 |