Here is the screenshot of the HTML structure for the page I am trying to scrape.
You can see that there is a <table>
element with . When I use the xpath
//table[@class='waffle']
on chrome console, it works as expected:
However when I use the same path on Selenium it doesn't work.
container_xpath = "//table[@class='waffle']"
# wait
try:
wait = WebDriverWait(driver, 30)
container = wait.until(EC.presence_of_element_located((By.XPATH, container_xpath)))
print('container found')
except Exception as e:
print('container not found')
raise PageDidNotLoadError
return
The python script prints "container not found".
What is wrong with selenium?
CodePudding user response:
Its a common practice to hide the elements under nested iframe. You need to switch to the outer iframe first and then to the inner frame. The below code should work for you
# Switch to outer iframe
oframe = driver.find_element(By.CSS_SELECTOR, 'iframe')
driver.switch_to.frame(oframe)
# Switch to nested frame
iframe = driver.find_element(By.CSS_SELECTOR, 'iframe#pageswitcher-content')
driver.switch_to.frame(iframe)
# get the container
container = wait.until(EC.presence_of_element_located((By.XPATH, container_xpath)))
To get the same in a table form you can do
import pandas as pd
table = pd.read_html(container.get_attribute('outerHTML'))
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | |
---|---|---|---|---|---|---|---|
0 | 1 | カード名 | 仕様 | レア | 型番 | タイプ | 状態A |
1 | nan | nan | nan | nan | nan | nan | nan |
2 | 2 | nan | nan | nan | nan | nan | nan |
3 | 3 | 【スペシャルアート(TAG TEAM GX)】 | nan | nan | nan | nan | nan |
4 | 4 | フシギバナ&ツタージャGX | SA | SR | 066/064 | 草 | 3300 |
5 | 5 | セレビィ&フシギバナGX | SA | SR | 097/095 | 草 | 3500 |
6 | 6 | モクロー&アローラナッシーGX | SA | SR | 056/054 | 草 | 3300 |
7 | 7 | フェローチェ&マッシブーンGX | SA | SR | 056/054 | 草 | 2300 |
8 | 8 | レシラム&リザードンGX | SA | SR | 097/095 | 炎 | 20000 |
9 | 9 | リザードン&テールナーGX | SA | SR | 068/064 | 炎 | 6000 |
10 | 10 | カメックス&ポッチャマGX | SA | SR | 070/064 | 水 | 5000 |
11 | 11 | コイキング&ホエルオーGX | SA | SR | 099/095 | 水 | 5500 |
12 | 12 | ヤドン&コダックGX | SA | SR | 096/094 | 水 | 4000 |
13 | 13 | ピカチュウ&ゼクロムGX | SA | SR | 101/095 | 雷 | 30000 |
14 | 14 | ライチュウ&アローラライチュウGX | SA | SR | 057/054 | 雷 | 5500 |
CodePudding user response:
<iframe style="border-width: 2px; border-style: solid; border-color: red; width: 1000px; height: 200000px;" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQT3Q9qDbZUpnP3_WH2I5qw8O-U_PqXVhhoIzH2o-tSzeDND9FTuoGKbZiNHTbrzTgKAUA2_SvXFh_2/pubhtml?gid=159569114&single=true&widget=true&headers=false&gid=0&range=A:F" width="320" height="240"></iframe>
<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQT3Q9qDbZUpnP3_WH2I5qw8O-U_PqXVhhoIzH2o-tSzeDND9FTuoGKbZiNHTbrzTgKAUA2_SvXFh_2/pubhtml/sheet?headers=false&gid=159569114&range=A:F" style="display: block; width: 100%; height: 100%;"></iframe>
You need to switch to the inner iframe after switching to the outer one.
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#pageswitcher-content")))
Imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC