Home > database >  Hello! Can anyone explain to me why my print is returning "None"?
Hello! Can anyone explain to me why my print is returning "None"?

Time:09-07

I'm practicing scraping with BeautifulSoup on a job page but my print is returning "None" for some odd reason, any ideas? Thanks in advance!

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://jobgether.com/es/oferta/63083ece6d137a0ac6e701e6-part-time-business-psychologist-intern'
website = requests.get(url)
Soup = BeautifulSoup(website.content, 'html.parser')

Title = Soup.find('h5', class_="mb-0 p-2 w-100 bd-highlight fs-22")
print(Title) 

CodePudding user response:

That page is being hydrated with data via a javascript API: you can find that API by inspecting Dev tools - network tab, and you can see the information is being pulled as JSON from that API endpoint. This is one way to obtain thaat data, using requests:

import requests
import pandas as pd

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}


url = 'https://filter-api.jobgether.com/api/offer/63083ece6d137a0ac6e701e6?$populate[0][path]=meta.continents&$populate[0][select]=name&$populate[1]=meta.countries&$populate[2]=meta.regions&$populate[3]=meta.cities&$populate[4]=meta.studiesArea&$populate[5]=meta.salary&$populate[6]=meta.languages&$populate[7]=meta.hardSkills&$populate[8]=meta.industries&$populate[9]=meta.technologies&$populate[10][path]=company&$populate[10][select]=name meta.logo meta.industries meta.companyType meta.flexiblePolicy meta.employees meta.mainOfficeLocation meta.subOfficeLocation status description meta.mission meta.description meta.hardSkills meta.technologies meta.slug&$populate[10][populate][0]=meta.industries&$populate[10][populate][1]=meta.mainOfficeLocation&$populate[10][populate][2]=meta.subOfficeLocation'

r = requests.get(url, headers=headers)
obj = r.json()
print(obj['title'])
print(obj['meta']['apply_url'])
print(obj['meta']['countries'])
df = pd.json_normalize(obj['meta']['hardSkills'])
print(df)

This will display in terminal:

Part-Time Business Psychologist Intern
https://it.linkedin.com/jobs/view/externalApply/3221880417?url=https://teamtailor.assessfirst.com/jobs/1462616-uk-part-time-business-psychologist-student-intern?promotion=464724-trackable-share-link-uk-business-psychologist-li&urlHash=dzk3&trk=public_jobs_apply-link-offsite
[{'_id': '622a65b4671f2c8b98fac83f', 'name': 'United Kingdom', 'alpha_code': 'GBR', 'continent': '622a659af0bac38678ed1398', 'geo': [-0.127758, 51.507351], 'name_es': 'Reino Unido', 'name_fr': 'Royaume-Uni', 'deleted_at': None, 'amount_of_use': 11407, 'alpha_2_code': 'GB'}]
_id id  name    name_es name_fr category_id status  createdAt   updatedAt   deletedAt   hard_skill_categories   hard_skill_category
0   623ca7112198fdff24e1a1b0    5   Design  Design  Design  1   1   0000-00-00 00:00:00 0000-00-00 00:00:00 None    Marketing   621d2a97058dc9445a92c4be
1   623ca7112198fdff24e1a249    173 Research    Investigación   Recherche   8   1   0000-00-00 00:00:00 0000-00-00 00:00:00 None    Business    621d2a97058dc9445a92c4c5
2   623ca7112198fdff24e1a24a    174 Science Ciencia Science 8   1   0000-00-00 00:00:00 0000-00-00 00:00:00 None    Business    621d2a97058dc9445a92c4c5
3   623ca7112198fdff24e1a292    1165    Customer Success    Customer Success    Customer Success    4   1   2021-07-07 10:53:19 2021-07-07 10:53:19 None    Sales   621d2a97058dc9445a92c4c1

You can print out the full json response, inspect it, dissect it and extract the relevant information from it (it's quite comprehensive). Relevant documentation for requests:

https://requests.readthedocs.io/en/latest/

And also, pandas documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

  • Related