New to python and I have been using this piece of code in order to get the class name as a text for my csv but can't make it to only extract the first one. Do you have any idea how to ?
for x in book_url_soup.findAll('p', class_="star-rating"):
for k, v in x.attrs.items():
review = v[1]
reviews.append(review)
del reviews[1]
print(review)
the url is : http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html
the output is:
Two
Two
One
One
Three
Five
Five
I only need the first output and don't know how to prevent the code from getting the "star ratings" from below the page that shares the same class name.
CodePudding user response:
Instead of find_all()
that will create a ResultSet
you could use find()
or select_one()
to select only the first occurrence of your element and pick the last index from the list of class names:
soup.find('p', class_='star-rating').get('class')[-1]
or with css selector
soup.select_one('p.star-rating').get('class')[-1]
In newer code also avoid old syntax findAll()
instead use find_all()
- For more take a minute to check docs
Example
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html'
page = requests.get(url).text
soup = BeautifulSoup(page)
soup.find('p', class_='star-rating').get('class')[-1]
Output
Two