Home > Software engineering >  Scraping text in meta tag with selenium
Scraping text in meta tag with selenium

Time:03-02

I'm trying to get the book description from the following webpage: https://bookshop.org/books/lucky-9798200961177/9781668002452

This is what I've got so far

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)
driver.get('https://bookshop.org/a/16709/9781668002452')
description = driver.find_elements_by_xpath('//meta[@content]')[0].text
description

Basically, I'm trying to get the text inside of this html:


<meta name="description" content="REESE'S BOOK CLUB PICK NEW YORK TIMES BESTSELLER A thrilling roller-coaster ride about a heist gone terribly wrong, with a plucky protagonist who will win readers' hearts. What if you had the winning ticket ....">

but I couldn't locate the text in the content. Can anyone advise how to get to the text in the content?

CodePudding user response:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome('path_to_my_driver_on_local', options=options)

driver.get('https://bookshop.org/books/lucky-9798200961177/9781668002452')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
Description = soup.find_all('div', class_="title-description")
print(Description[0].text)

CodePudding user response:

elem=driver.find_element(By.XPATH,"//meta[@name='description']")
print(elem.get_attribute("content"))

You can use a more inclusive xpath. Then target the attribute for content.

Imports:

from selenium.webdriver.common.by import By
  • Related