I'm trying to get the book description from the following webpage: https://bookshop.org/books/lucky-9798200961177/9781668002452
This is what I've got so far
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)
driver.get('https://bookshop.org/a/16709/9781668002452')
description = driver.find_elements_by_xpath('//meta[@content]')[0].text
description
Basically, I'm trying to get the text inside of this html:
<meta name="description" content="REESE'S BOOK CLUB PICK NEW YORK TIMES BESTSELLER A thrilling roller-coaster ride about a heist gone terribly wrong, with a plucky protagonist who will win readers' hearts. What if you had the winning ticket ....">
but I couldn't locate the text in the content. Can anyone advise how to get to the text in the content?
CodePudding user response:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)
driver.get('https://bookshop.org/books/lucky-9798200961177/9781668002452')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
Description = soup.find_all('div', class_="title-description")
print(Description[0].text)
CodePudding user response:
elem=driver.find_element(By.XPATH,"//meta[@name='description']")
print(elem.get_attribute("content"))
You can use a more inclusive xpath. Then target the attribute for content.
Imports:
from selenium.webdriver.common.by import By