I'm practicing scraping with BeautifulSoup on a job page but my print is returning "None" for some odd reason, any ideas? Thanks in advance!
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://jobgether.com/es/oferta/63083ece6d137a0ac6e701e6-part-time-business-psychologist-intern'
website = requests.get(url)
Soup = BeautifulSoup(website.content, 'html.parser')
Title = Soup.find('h5', class_="mb-0 p-2 w-100 bd-highlight fs-22")
print(Title)
CodePudding user response:
That page is being hydrated with data via a javascript API: you can find that API by inspecting Dev tools - network tab, and you can see the information is being pulled as JSON from that API endpoint. This is one way to obtain thaat data, using requests:
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://filter-api.jobgether.com/api/offer/63083ece6d137a0ac6e701e6?$populate[0][path]=meta.continents&$populate[0][select]=name&$populate[1]=meta.countries&$populate[2]=meta.regions&$populate[3]=meta.cities&$populate[4]=meta.studiesArea&$populate[5]=meta.salary&$populate[6]=meta.languages&$populate[7]=meta.hardSkills&$populate[8]=meta.industries&$populate[9]=meta.technologies&$populate[10][path]=company&$populate[10][select]=name meta.logo meta.industries meta.companyType meta.flexiblePolicy meta.employees meta.mainOfficeLocation meta.subOfficeLocation status description meta.mission meta.description meta.hardSkills meta.technologies meta.slug&$populate[10][populate][0]=meta.industries&$populate[10][populate][1]=meta.mainOfficeLocation&$populate[10][populate][2]=meta.subOfficeLocation'
r = requests.get(url, headers=headers)
obj = r.json()
print(obj['title'])
print(obj['meta']['apply_url'])
print(obj['meta']['countries'])
df = pd.json_normalize(obj['meta']['hardSkills'])
print(df)
This will display in terminal:
Part-Time Business Psychologist Intern
https://it.linkedin.com/jobs/view/externalApply/3221880417?url=https://teamtailor.assessfirst.com/jobs/1462616-uk-part-time-business-psychologist-student-intern?promotion=464724-trackable-share-link-uk-business-psychologist-li&urlHash=dzk3&trk=public_jobs_apply-link-offsite
[{'_id': '622a65b4671f2c8b98fac83f', 'name': 'United Kingdom', 'alpha_code': 'GBR', 'continent': '622a659af0bac38678ed1398', 'geo': [-0.127758, 51.507351], 'name_es': 'Reino Unido', 'name_fr': 'Royaume-Uni', 'deleted_at': None, 'amount_of_use': 11407, 'alpha_2_code': 'GB'}]
_id id name name_es name_fr category_id status createdAt updatedAt deletedAt hard_skill_categories hard_skill_category
0 623ca7112198fdff24e1a1b0 5 Design Design Design 1 1 0000-00-00 00:00:00 0000-00-00 00:00:00 None Marketing 621d2a97058dc9445a92c4be
1 623ca7112198fdff24e1a249 173 Research Investigación Recherche 8 1 0000-00-00 00:00:00 0000-00-00 00:00:00 None Business 621d2a97058dc9445a92c4c5
2 623ca7112198fdff24e1a24a 174 Science Ciencia Science 8 1 0000-00-00 00:00:00 0000-00-00 00:00:00 None Business 621d2a97058dc9445a92c4c5
3 623ca7112198fdff24e1a292 1165 Customer Success Customer Success Customer Success 4 1 2021-07-07 10:53:19 2021-07-07 10:53:19 None Sales 621d2a97058dc9445a92c4c1
You can print out the full json response, inspect it, dissect it and extract the relevant information from it (it's quite comprehensive). Relevant documentation for requests:
https://requests.readthedocs.io/en/latest/
And also, pandas documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html