Hello this is my first project in python and my goal is to scrape the full description of books in goodreads. The final goal of the script is to enter the book ids you want, and take back in a file the book_id in a column and the description of this book_id. For now I can enter the number of the item I want in the list and get the description.
my_urls = 'https://www.goodreads.com/book/show/' book_id[0]
How can I loop this procedure and get the description for each book? This is my code, thanks in advance.
import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError
book_id = ['17227298','18386','1852','17245','60533063'] # Here I enter my book idυ
my_urls = 'https://www.goodreads.com/book/show/' book_id[0] #I concatenate book_id with the url
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span # finds the description div
full_description = short_description.find_next_siblings('span') # Goes to the sibling span that has the full description
def get_description(soup):
full_description = short_description.find_next_siblings('span')
return full_description
CodePudding user response:
Define a method that does the actions for one item
def get_description(book_id):
my_urls = 'https://www.goodreads.com/book/show/' book_id
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span
full_description = short_description.find_next_siblings('span')
return full_description
Then call it on each item of the list
book_ids = ['17227298', '18386', '1852', '17245', '60533063']
for book_id in book_ids:
print(get_description(book_id))