Home > Blockchain >  Scrape Job description Indeed Selenium
Scrape Job description Indeed Selenium

Time:04-23

A similar subject exists but I couldn't find the exact answer, so please could you help me?

I copied from the internet the following code to scrape job offers from indeed. The problem is the code cannot scrape job descriptions.

When using : sum_div = job.find_elements_by_class_name('summary') The code doesn't identify 'summary' and doesn't get the place where the job description is, and it is also unable to close the pop-up that appears on indeed.

I tried other identifier like :sum_div = job.find_element_by_class_name('job_seen_beacon') It goes over and closes the pop-up, but still isn't good to identify where the job description is.

Do you, please, have any idea how to solve this?

for i in range(0,50,10):
    driver.get('https://www.indeed.co.in/jobs?q=artificial intelligence&l=India&start=' str(i))
    jobs = []
    driver.implicitly_wait(20)


for job in driver.find_elements_by_class_name('result'):
         
   
    #soup = BeautifulSoup(job.get_attribute('innerHTML'),'html.parser')
    result_html = job.get_attribute('innerHTML')
    soup = BeautifulSoup(result_html, 'html.parser')
    
    try:
        title = soup.find(class_="jobTitle").text
        
    except:
        title = 'None'


    try:
        location = soup.find(class_="companyLocation").text
    except:
        location = 'None'

    try:
        company = soup.find(class_="companyName").text.replace("\n","").strip()
    except:
        company = 'None'


    
    sum_div = job.find_elements_by_class_name('summary')
    #sum_div = job.find_element_by_class_name('job_seen_beacon')
    
    try: 
        sum_div.click()

    except:
        close_button = driver.find_elements_by_class_name('popover-x-button-close')
        close_button.click()
        sum_div.click()
        
    driver.implicitly_wait(2)
    
    try: 
        job_desc = driver.find_element_by_css_selector('div#vjs-desc').text
        print(job_desc)
    
    except:
        job_desc = 'None'   


    df = df.append({'Title':title,'Location':location,"Company":company,
                            "Description":job_desc},ignore_index=True)

CodePudding user response:

The url isn't dynamic.So no need to use selenium.You can extract desired data using bs4 and requests.Below is given an example.

P/S: You may not use try except as each page contains 15 items equally.

from bs4 import BeautifulSoup
import requests
import pandas as pd
jobs = []
for i in range(0,50,10):
    url='https://www.indeed.co.in/jobs?q=artificial intelligence&l=India&start=' str(i)
    req=requests.get(url)
    
    soup = BeautifulSoup(req.content, 'html.parser')
    for job in soup.select('.result'):
             
        try:
            title = job.find(class_="jobTitle").text
        
        except:
            title = 'None'


        try:
            location = job.find(class_="companyLocation").text
        except:
            location = 'None'

        try:
            company = job.find(class_="companyName").text.replace("\n","").strip()
        except:
            company = 'None'

        try: 
            job_desc = job.select_one('div.job-snippet ul ').get_text(strip=True)
        except:
            job_desc = 'None' 

        jobs.append({'Title':title,'Location':location,"Company":company,"Description":job_desc})  
   

df =pd.DataFrame(jobs)
print(df)
#to store data 
#df.to_csv('data.csv',index=False)

Output:

           Title                                         Description
                   
0          newData Scientist: Artificial Intelligence  ...  As a Data Scientist at IBM, you will help tran...
1                             AI and Machine Learning  ...  A machine learning 
engineer (ML engineer) focu...
2                      newGraduate Intern - Technical  ...  DPEA enables that data center which is the und...
3   Artificial Intelligence & Machine Learning Expert  ...  Define and drive projects in AI and Machine Le...
4                              newML Data Associate I  ...  Good familiarity with the Windows desktop envi...
..                                                ...  ...
                           ...
70                                  newData Scientist  ...  Perform data analysis and modelling on data se...
71          AI, Informatics & ML – Research Scientist  ...  Years of experience 2-4 yrs.Key Responsibiliti...
72                               Software Development  ...  Software Developers at IBM are the backbone of...
73            newB2B/EDI - Map Development Specialist  ...  Software Developers at IBM are the backbone of...
74  Artificial Intelligence / Data Science/ Machin...  ...  TATA ELXSI Ltd. is conducting off 
campus drive...

[75 rows x 4 columns]
  • Related