I am trying to scrape the h2 tag below from the apple page in the python 3.10.6 code further below. I can see the h2 tag on the page; but my python running on PyCharm 2022.1.4 is unable to scrape it. episode-shelf-header is a unique class in the html code on this page.
I did search for a solution to this but was unable to find one.
Can anyone help?
<div id="{{@model.id}}-{{@shelf.id}}">
<h2 >
Season 1
</h2>
</div>
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000')
pageSource = driver.page_source
soup = BeautifulSoup(pageSource, 'html.parser')
div = soup.find('div', attrs={'class': 'episode-shelf-header'})
h2 = div.find('h2', attrs={'class': 'typ-headline-emph'})
CodePudding user response:
You can use selenium built in methods:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000')
driver.find_element_by_class_name("episode-shelf-header").text
Output:
Out[16]: 'Season 1'
CodePudding user response:
- Value can be extracted directly from Selenium.
- You must wait for the page to fully load.
There is a sample code to extract the final value.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://tv.apple.com/us/show/life-by-ella/umc.cmc.1suiyueh1ntwjtsstcwldofno?ctx_brand=tvs.sbd.4000')
x_path = '//*[@id="{{@model.id}}-{{@shelf.id}}"]/h2'
element = WebDriverWait(driver, 10).until(lambda x: x.find_element(By.XPATH, x_path))
print(element.text)
note: selenium version:
selenium 4.3.0