I wrote the following source code to scrape titles/authors from Amazon books. However, "find all" only returns me information from the first 30 books instead of all 50 books on the page.
I noticed that the first 30 books are the ones that have already been loaded without scrolling the search bar, but I'm not sure if this is the reason.
s = HTMLSession()
url = "https://www.amazon.com/Best-Sellers-Kindle-Store-Arts-Photography/zgbs/digital-text/154607011/ref=zg_bs_nav_digital-text_3_157325011"
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
books = soup.find_all("div", {"class":"_p13n-zg-list-grid-desktop_truncationStyles_p13n-sc-css-line-clamp-1__1Fn1y"})
CodePudding user response:
Try using the requests
library and change the selector to something that is less dynamic than the class
value you have used in your code. See below sample code using requests
from requests import session
from bs4 import BeautifulSoup
s = session()
url = "https://www.amazon.com/Best-Sellers-Kindle-Store-Arts-Photography/zgbs/digital-text/154607011/ref=zg_bs_nav_digital-text_3_157325011"
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
books = soup.find_all("div", {"id":"gridItemRoot"})
print(len(books))
You will get below print out in the terminal
50