Scrape more than one page using python-CodePudding

I'm trying to scrape the different prices for an item and i would like to scrape all the available pages to get the average price ,i've tried the below code but it is not working properly:

     page_num = 1
     page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn={page_num}')
     lists = soup.find_all('li', class_="s-item s-item__pl-on-bottom s-item--watch-at-corner")
     prices = []
     for list in lists:
        prices.append(float(list.find('span', class_="s-item__price").text.replace('£','').replace(',','')))
        page_num= page_num   1
     avg = sum(prices)/len(prices)  
     print(avg)

CodePudding user response：

You might be running into errors because you are overwriting python's "list" keyword in your for-loop. try changing "for list in lists" to "for item in lists" and update your loop contents accordingly.

Additionally you are defining page at the beginning of your example code, but never updating the page number and making a new request. You will need to restructure your script to update the page number in the URL within the loop.

CodePudding user response：

This should do what you want. I found the last page number, i.e. 9, and then scraped each page until the last page was scraped.

There is, however, an issue with gathering all of the products; there are 9 pages and each page displays 60 products (by default), but I was only able to get 265 prices. The discrepancy is likely caused by the product li tags having different class attributes. For example some, of the class attributes had only had the s-item s-item__pl-on-bottom and not s-item--watch-at-corner.

import requests
from bs4 import BeautifulSoup

# getting html of first page to find total number of succeeding pages
page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn=1').text
soup = BeautifulSoup(page, 'html.parser')

# find last page number
end_page = soup.find('a', href='https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn=9&rt=nc').text

prices = []
page_num = 0

# gets html of each page until last page is reach
while page_num < int(end_page):
    page_num  = 1
    page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn={page_num}').text
    soup = BeautifulSoup(page, 'html.parser')

    # list of all ;i tags in a page
    lists = soup.find_all('li', class_="s-item s-item__pl-on-bottom s-item--watch-at-corner")

    # iterate over each page's li tags and append product price to a list
    for list in lists:
        prices.append(float(list.find('span', class_="s-item__price").text.replace('£','').replace(',','')))

# Average price of the scraped product prices
print(sum(prices)/len(prices))