Home > Back-end >  Python Selenium webscraping returns no data using XPATH
Python Selenium webscraping returns no data using XPATH

Time:03-10

Tried to scrape data from a webpage. After login to the site, in the developer tools able to search the xpath and find the match. But, paython code is not returning the data.

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
browser.get(loginURL)

nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
    print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

The output is

d:\My Personal\2022\Suresh\Learning\Python\zerodha.py:27: DeprecationWarning: executable_path has been deprecated, please pass in a Service object browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")

DevTools listening on ws://127.0.0.1:57153/devtools/browser/74a90941-a12f-4be4-b12a-01b256292a5f [15120:6400:0309/123030.129:ERROR:device_event_log_impl.cc(214)] [12:30:30.129] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F) [15120:6400:0309/123030.137:ERROR:device_event_log_impl.cc(214)] [12:30:30.136] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F) Len of nifty_bank_values_xpath: 0

similarly, when tried with find_element

nifty_bank_values_xpath = browser.find_element(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")

Getting following error:

raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//span[contains(@class, 'pane-legend-item-value__main')]"} (Session info: chrome=99.0.4844.51) Stacktrace: Backtrace:

Able to find the data in the Dev Tools->Elements returning 6 matches.

image Developer Tools indicates matching rows

html captured from dev console

<body >
   <noscript><strong>We're sorry but kite doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript>
   <div id="app" >
      <div >
         <div >
            <!----> 
            <div >
               <!----> <!----> 
               <div ><a href="/marketwatch" ><span ></span></a> <a href="/dashboard" ><span ></span></a> <a href="/orders" ><span ></span></a> <a href="/holdings" ><span ></span></a> <a href="/positions" ><span ></span></a> <a href="/funds" ><span ></span></a></div>
               <div >
                  <div >
                     <a href="" >
                        <div id="avatar-43">
                           <div  style="width: 25px; height: 25px; border-radius: 50%; text-align: center; vertical-align: middle; background-color: rgba(156, 39, 176, 0.1); font-size: 9px; font-weight: 300; color: rgb(156, 39, 176); line-height: 26px;"><span>SS</span></div>
                           <!---->
                        </div>
                        <span >ZX8487</span>
                     </a>
                     <!---->
                  </div>
               </div>
            </div>
         </div>
      </div>
      <div >
         <!----> 
         <div >
            <!----> <!----> 
            <div >
               <!----> 
               <div>
                  <div >
                     <div id="tv_chart_container"  style="height: 547px;"><iframe id="tradingview_e1a6c" name="tradingview_e1a6c" src="/static/tv-chart/static/en-tv-chart.aaac22e21df68f2f7bad.html#symbol=NIFTY BANK:INDICES:260105&amp;interval=1D&amp;widgetbar={"details":false,"watchlist":false,"watchlist_settings":{"default_symbols":[]}}&amp;timeFrames=[{"text":"5y","resolution":"W"},{"text":"1y","resolution":"W"},{"text":"6m","resolution":"120"},{"text":"3m","resolution":"60"},{"text":"1m","resolution":"30"},{"text":"5d","resolution":"5"},{"text":"1d","resolution":"1"}]&amp;locale=en&amp;uid=tradingview_e1a6c&amp;clientId=tradingview.com&amp;userId=ZX8487&amp;chartsStorageUrl=/api/chart/preferences&amp;chartsStorageVer=1.1&amp;customCSS=/static/tv-chart/static/custom_style.css&amp;debug=false&amp;timezone=Asia/Kolkata&amp;theme=Light" frameborder="0" allowtransparency="true" scrolling="no" allowfullscreen="" style="display: block; width: 100%; height: 100%;"></iframe></div>
                  </div>
                  <div >
                     <div >
                        <div >
                           <div >Open</div>
                           <div >33278.9</div>
                        </div>
                        <div >
                           <div >High</div>
                           <div >33890.9</div>
                        </div>
                        <div >
                           <div >Low</div>
                           <div >32948.9</div>
                        </div>
                        <div >
                           <div >Close</div>
                           <div >33158.1</div>
                        </div>
                     </div>
                     <div >
                        <div >
                           <div >Volume</div>
                           <div >—</div>
                        </div>
                        <div >
                           <div >Avg. trade price</div>
                           <div >—</div>
                        </div>
                        <div >
                           <div >Total buy quantity</div>
                           <div >—</div>
                        </div>
                        <div >
                           <div >Total sell quantity</div>
                           <div >—</div>
                        </div>
                     </div>
                  </div>
                  <!---->
               </div>
            </div>
         </div>
      </div>
      <!----> <!----> 
      <div >
         <!----> <!----> <!----> <!----> <!----> <!---->
      </div>
      <!----> 
      <div>
         <!----> <!---->
      </div>
      <!----> <!----> <!----> <!----> 
      <div >
         <!---->
      </div>
      <!----> <!---->
   </div>
   <script async="">try {
      var theme = JSON.parse(localStorage.__storejs_kite_theme);
      if (theme) {
        document.documentElement.setAttribute("data-theme", theme);
      }
      } catch (_) {
      }
   </script><script type="module" src="/static/js/chunk-vendors.ea6114a1.js"></script><script type="module" src="/static/js/app.ae4bb317.js"></script><script>!function(){var e=document,t=e.createElement("script");if(!("noModule"in t)&&"onbeforeload"in t){var n=!1;e.addEventListener("beforeload",function(e){if(e.target===t)n=!0;else if(!e.target.hasAttribute("nomodule")||!n)return;e.preventDefault()},!0),t.type="module",t.src=".",e.head.appendChild(t),t.remove()}}();</script><script src="/static/js/chunk-vendors-legacy.ea6114a1.js" nomodule=""></script><script src="/static/js/app-legacy.cffeb71c.js" nomodule=""></script>
   <div >
      <div >
         <div></div>
      </div>
      <div >
         <div></div>
      </div>
      <div >
         <div></div>
      </div>
      <div >
         <div></div>
      </div>
      <div >
         <div></div>
      </div>
      <div >
         <div></div>
      </div>
   </div>
   <!---->
</body>

CodePudding user response:

You are missing a wait here.
You should wait for the elements to be completely loaded before accessing them with find_elements methods.
The best approach here is to use Expected Conditions explicit waits, as following:

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
wait = WebDriverWait(browser, 20)
browser.get(loginURL)
wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")))
time.sleep(0.3) #short pause added to make sure that all the relevant elements are loaded, not only the first one
nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

UPD
Since that element is inside an iframe you have to switch to that iframe before accessing elements inside it, as following:

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
wait = WebDriverWait(browser, 20)
browser.get(loginURL)
wait).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='tradingview']")))

wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")))
time.sleep(0.3) #short pause added to make sure that all the relevant elements are loaded, not only the first one
nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

When you finished working with elements inside the iframe, to switch to default content you will need to perform

driver.switch_to.default_content()

CodePudding user response:

Thank you Prophet guided me to explore iframe.

browser.get(niftybankchartURL)
time.sleep(10)

# jump into iframe
browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))

Once the switch_to into the frame the XPATH was working fine.

Thank you all for the guidance and patience.

  • Related