This is the website I want to scrape https://anime-hayai.com/play/30148/ตอนที่-1-hd.html
I want to scrape scr='....' data from video but it's returning empty string.
What i have tried
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://anime-hayai.com/play/30148/ตอนที่-1-hd.html')
video = driver.find_element_by_xpath("//*[@id='player']/div[2]/div[4]/video").text
print(video)
It's returning ''
empty string.
Is I am doing something wrong? enter image description here Expected result from video scr
'https://stream.anime-hayai.com/videoplayback?id=6o7ov8-mxpu0aKZQYtLDpMuepaDas4tllm2jqqGUcqDE1c-j0pzahY1pxGuiaJhSteDG5NLdkaaVcotklmRye55ecaGWoJqio46hcs2XyKSmaKZQYqCXoovq'
CodePudding user response:
There are nested iframe, so first you have to
- Switch to first iframe
- Switch to child iframe
Also, you would need to scroll all the way down.
Code :
chromedriver_autoinstaller.install()
driver_path = r'C:\\Users\\panabh02\\OneDrive - CSG Systems Inc\\Desktop\\Automation\\chromedriver.exe'
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://anime-hayai.com/play/30148/ตอนที่-1-hd.html")
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[name='video_player']")))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[class='embed-responsive-item']")))
video_url = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[@class='jw-media jw-reset']//*"))).get_attribute('src')
print(video_url)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output:
https://stream.anime-hayai.com/videoplayback?id=6o7ov8-mxpu0aKZQYtLDpMuepaDas4tllm2jqqGUcqDE1c-j0pzahY1pxGuiaJhSteDG5NLdkaaVcotklmRye55ecaGWoJqio46hcs2XyKSmaKZQYqCXoovq
CodePudding user response:
https://anime-hayai.com/player/30148
is not a text attribute value rather than src
attribute value.
Also your locator is not so good.
So try this
iframe_src = driver.find_element_by_xpath("//iframe[@name='video_player']").get_attribute("src")
You will possibly have to add a delay / wait before that code line to let the page fully loaded before you going to extract this element content.
Also, you will possibly have to switch to this iframe in order to access it. So, if the above solution still not working try this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://anime-hayai.com/play/30148/ตอนที่-1-hd.html')
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@name='video_player']")))
iframe_src = driver.find_element_by_xpath("//iframe[@name='video_player']").get_attribute("src")
print(iframe_src)