a python beginner here. I am using BeautifulSoup to scrape the details(title, quantity in stock) of all books in the first page of books.toscrape.com . For that, first getting links to all the individual books has to take place. I have made the function page1_url for the same. The problem is, upon returning the list of the links extracted, only the first element of the list is returned. Please help in identifying the error or provide an alternative code using BeautifulSoup only. Thanks in advance!
import requests
from bs4 import BeautifulSoup
def page1_url(page1):
response= requests.get(page1)
data= BeautifulSoup(response.text,'html.parser')
b1= data.find_all('h3')
for i in b1:
l=i.find_all('a')
for j in l:
l1=j['href']
books_urls=[]
books_urls.append(base_url l1)
books_urls=list(books_urls)
return books_urls
allPages = ['http://books.toscrape.com/catalogue/page-1.html',
'http://books.toscrape.com/catalogue/page-2.html']
base_url= 'http://books.toscrape.com/catalogue/'
bookURLs= page1_url(allPages[0])
print(bookURLs)
CodePudding user response:
You are rewriting the books_urls
list for each link, and you are returning the function after the first element in the for j in l
loop:
import requests
from bs4 import BeautifulSoup
def page1_url(page1):
response= requests.get(page1)
data= BeautifulSoup(response.text,'html.parser')
b1= data.find_all('h3')
# you were rewriting this list for each link
books_urls = []
for i in b1:
l=i.find_all('a')
for j in l:
l1=j['href']
books_urls.append(base_url l1)
# these lines had too many indents
books_urls=list(books_urls)
return books_urls
allPages = ['http://books.toscrape.com/catalogue/page-1.html',
'http://books.toscrape.com/catalogue/page-2.html']
base_url= 'http://books.toscrape.com/catalogue/'
bookURLs= page1_url(allPages[0])
print(bookURLs)
['http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html', 'http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html', 'http://books.toscrape.com/catalogue/soumission_998/index.html', 'http://books.toscrape.com/catalogue/sharp-objects_997/index.html', ... 'http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html']