Home > OS >  Loop Function in Python for webscraping
Loop Function in Python for webscraping

Time:04-10

Hello this is my first project in python and my goal is to scrape the full description of books in goodreads. The final goal of the script is to enter the book ids you want, and take back in a file the book_id in a column and the description of this book_id. For now I can enter the number of the item I want in the list and get the description. my_urls = 'https://www.goodreads.com/book/show/' book_id[0] How can I loop this procedure and get the description for each book? This is my code, thanks in advance.

import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError

book_id = ['17227298','18386','1852','17245','60533063']  # Here I enter my book idυ
my_urls = 'https://www.goodreads.com/book/show/'   book_id[0] #I concatenate book_id with the url
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span  # finds the description div
full_description = short_description.find_next_siblings('span')  # Goes to the sibling span that has the full description

def get_description(soup):  
    full_description = short_description.find_next_siblings('span')
    return full_description

CodePudding user response:

Define a method that does the actions for one item

def get_description(book_id):
    my_urls = 'https://www.goodreads.com/book/show/'   book_id
    source = urlopen(my_urls).read()
    soup = bs.BeautifulSoup(source, 'lxml')
    short_description = soup.find('div', class_='readable stacked').span
    full_description = short_description.find_next_siblings('span')
    return full_description

Then call it on each item of the list

book_ids = ['17227298', '18386', '1852', '17245', '60533063']
for book_id in book_ids:
    print(get_description(book_id))
  • Related