I can't scrap the text after the "Product Description".
http://books.toscrape.com/catalogue/1000-places-to-see-before-you-die_1/index.html
This is my code so far:
book_url = 'http://books.toscrape.com/catalogue/1000-places-to-see-before-you-die_1/index.html'
response = requests.get(book_url)
soup = BeautifulSoup(response.content, 'lxml')
book_body = soup.find('article', class_='product_page')
Should I extract all the "p" tags before the text?
CodePudding user response:
HTML IDs are unique (or at least should be), you should always use them when scraping if available, in your case the "product description" is under the id product_description
:
<div id="product_description" >
<h2>Product Description</h2>
</div>
So, to find the id="product_description"
use:
import requests
from bs4 import BeautifulSoup
book_url = 'http://books.toscrape.com/catalogue/1000-places-to-see-before-you-die_1/index.html'
response = requests.get(book_url)
soup = BeautifulSoup(response.content, 'lxml')
book_body = soup.find(id='product_description')
print(book_body.get_text(strip=True))