Home > Software design >  How To Scrape Content With Load More Pages Using Selenium Python
How To Scrape Content With Load More Pages Using Selenium Python

Time:09-08

I need to scrape the titles for all blog post articles via a Load More button as set by my desired range for i in range(1,3):

At present I'm only able to capture the titles for the first page even though i'm able to navigate to the next page using selenium.

Any help would be much appreciated.

from bs4 import BeautifulSoup
import pandas as pd
import requests
import time

# Selenium Routine
from requests_html import HTMLSession
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

# Removes SSL Issues With Chrome
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.add_argument('--ignore-certificate-errors-spki-list')
options.add_argument('log-level=3') 
options.add_argument('--disable-notifications')
#options.add_argument('--headless') # Comment to view browser actions


# Get website url
urls = "https://jooble.org/blog/"
r = requests.get(urls)

driver = webdriver.Chrome(executable_path="C:\webdrivers\chromedriver.exe",options=options)
driver.get(urls)

productlist = []

for i in range(1,3):
    
    # Get Page Information
    soup = BeautifulSoup(r.content, features='lxml')
    items = soup.find_all('div', class_ = 'post')
    print(f'LOOP: start [{len(items)}]')

    for single_item in items:
        title = single_item.find('div', class_ = 'front__news-title').text.strip()
        print('Title:', title)

        product = {
        'Title': title,
        }
        productlist.append(product)

    print()
    time.sleep(5)
    WebDriverWait(driver, 40).until(EC.element_to_be_clickable((By.XPATH,"//button[normalize-space()='Show more']"))).send_keys(Keys.ENTER)

driver.close()

# Save Results
df = pd.DataFrame(productlist)
df.to_csv('Results.csv', index=False)

CodePudding user response:

It do not need selenium overhead in this case, cause you can use requests directly to get quetsion specific data via api.

Try to check the network tab in your browsers devtools if you click the button and you get the url that is requested to load more content. Iterate and set parameter value &page={i}.

Example

import requests
import pandas as pd
from bs4 import BeautifulSoup

data = []
for i in range (1,3):
    url = f'https://jooble.org/blog/wp-admin/admin-ajax.php?id=&post_id=0&slug=home&canonical_url=https://jooble.org/blog/&posts_per_page=6&page={i}&offset=20&post_type=post&repeater=default&seo_start_page=1&preloaded=false&preloaded_amount=0&lang=en&order=DESC&orderby=date&action=alm_get_posts&query_type=standard'
    r=requests.get(url)
    if r.status_code != 200:
        print(f'Error occured: {r.status_code} on url: {url}')
    else:
        soup = BeautifulSoup(str(r.json()['html']))
        for e in soup.select('.type-post'):
            data.append({
                'title':e.select_one('.front__news-title').get_text(strip=True),
                'description':e.select_one('.front__news-description').get_text(strip=True),
                'url':e.a.get('href')
            })
            
pd.DataFrame(data)

Output

title description url
0 How To Become A Copywriter If you have a flair for writing, you might consider leveraging your talents to earn some dough by working as a copywriter. The primary aim of a copywriter is to… https://jooble.org/blog/how-to-become-a-copywriter/
1 How to Find a Job in 48 Hours A job search might sound scary for many people. However, it doesn't have to be challenging, long, or overwhelming. With Jooble, it is possible to find the best employment opportunities… https://jooble.org/blog/how-to-find-a-job-in-48-hours/
2 17 Popular Jobs That Involve Working With Animals If you are interested in caring for or helping animals, you can build a successful career in this field. The main thing is to find the right way. Working with… https://jooble.org/blog/17-popular-jobs-that-involve-working-with-animals/
3 How to Identify Phishing and Email Scam What Phishing and Email Scam Are Cybercrime is prospering, and more and more internet users are afflicted daily. The best example of an online scam is the phishing approach -… https://jooble.org/blog/how-to-identify-phishing-and-email-scam/
4 What To Do After Getting Fired For many people, thoughts of getting fired tend to be spine-chilling. No wonder, since it means your everyday life gets upside down in minutes. Who would like to go through… https://jooble.org/blog/what-to-do-after-getting-fired/
5 A mobile application for a job search in 69 countries has appeared Jooble, a job search site in 69 countries, has launched the Jooble Job Search mobile app for iOS and Android. It will help the searcher view vacancies more conveniently and… https://jooble.org/blog/a-mobile-application-for-a-job-search-in-69-countries-has-appeared/

...

  • Related