How to scrap glassdoor Company Overview-CodePudding

import requests
import json
headers = {'User-Agent': '1.Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0',
          'Accept': 'application/json'
          }
url = 'https://www.glassdoor.com/Job/didcot-england-junior-software-developer-jobs-SRCH_IL.0,14_IC3380446_KO15,40.htm?src=GD_JOB_AD&srs=ALL_RESULTS&jl=1007788802873&ao=1136043&s=345&guid=000001805a9ddcf09055e561485879b6&pos=101&t=SR-JOBS-HR&vt=w&uido=47F765FE71E439F398E8E149B3F8C23F&ea=1&cs=1_e20b1a86&cb=1650787737088&jobListingId=1007788802873&jrtk=3-0-1g1d9rn8pjm5g801-1g1d9rn9ipker800-bed2bb63e14539cd-'
r = requests.get(url, headers=headers).json()

I'm trying to get companies url from site, I know that glassdoor has his own api , but I cant get API credentials to access them. So I'm trying to do it manual but still without any results, who can help me with this issue.

CodePudding user response：

Instead of doing it by using their API (which needs authentication), you can get the info from automatized DOM analysis. There are many tools for that and you can even write your own tool. I particularly like Selenium.

CodePudding user response：

Since you are calling an ordinary url and not an API endpoint, you get HTML as a response, not JSON. So if you were to change your code to

r = requests.get(url, headers=headers)
print(r.content)

You would get results. From there, you can use a package such as Beautiful Soup to extract data. However, you might find your specific case of getting the company url to be a bit tough since it isn't readily available on the page. Still, I hope this gets you started.