Home > Back-end >  Python Web Scraping | How to scrape multiple urls with different page number through Beautifulsoup?
Python Web Scraping | How to scrape multiple urls with different page number through Beautifulsoup?

Time:09-22

from selenium import webdriver
import time
from bs4 import BeautifulSoup as Soup
driver = webdriver.Firefox(executable_path='C://Downloads//webdrivers//geckodriver.exe')
a = 'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page='
for c in range(8):

    #a = f'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page={c}'

    cd = driver.get(a str(c))

    page_source = driver.page_source
    bs = Soup(page_source, 'html.parser')

    fetch_data = bs.find_all('div', {'class': 's-expand-height.s-include-content-margin.s-latency-cf-section.s-border-bottom'})

    for f_data in fetch_data:
        product_name = f_data.find('span', {'class': 'a-size-medium.a-color-base.a-text-normal'})
        print(product_name   '\n')

Now The problem here is that, Webdriver successfully visits 7 pages, But doesn't provide any output or an error.

Now I don't know where M in going wrong.

Any suggestions, reference to a article that provides solution about this problem will be always welcomed.

CodePudding user response:

You can print bs or fetch_data to debug.

Anyway

In my opinion, you can use requests or urllib to get page_source instead of selenium

CodePudding user response:

You are not selecting the right div tag to fetch the products using BeautifulSoup, leading to no output.

Try the following snippet:-

#range of pages
for i in range(1,20):

    driver.get(f'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page={i}')
    page_source = driver.page_source
    bs = Soup(page_source, 'html.parser')
    
    #get search results
    products=bs.find_all('div',{'data-component-type':"s-search-result"})

    #for each product in search result print product name
    for i in range(0,len(products)):
        for product_name in products[i].find('span',class_="a-size-medium a-color-base a-text-normal"):
            print(product_name)
  • Related