Home > Net >  how to scrape data from table
how to scrape data from table

Time:11-29

I am trying to scrape data from table but they will provide me empty list

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
driver= webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
tabledata = driver.find_elements_by_xpath("//tbody/tr")
print(tabledata)

CodePudding user response:

As the <table> element is within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the visibility_of_element_located() desired _element and you can use either of the following Locator Strategies:

    • Using CSS-SELECTOR:

      driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Inline Frame Example']")))
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//tbody"))).text)
      
    • Using XPATH:

      driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#sites tbody"))).text)
      
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait
     from selenium.webdriver.common.by import By
     from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    FAM-1293 AmTech Ingredients Albert Lea UNITED STATES Valid 2020-10-08 2023-10-07
    FAM-0841 3F FEED & FOOD S L Vizcolozano SPAIN Valid 2020-04-17 2023-04-16
    FAM-1361 5N Plus Additives GmbH Eisenhüttenstadt GERMANY Valid 2020-10-01 2023-09-30
    FAM-1301-01 A & V Corp. Limited Xiamen CHINA Valid 2020-09-09 2023-09-08
    FAM-1146 A.   E. Fischer-Chemie GmbH & Co. KG Wiesbaden GERMANY Valid 2020-06-05 2023-06-04
    FAM-1589 A.M FOOD CHEMICAL CO LIMITED Jinan CHINA Valid 2020-01-07 2023-01-06
    FAM-0613-01 A.W.P. S.r.l Crevalcore ITALY Valid 2020-02-27 2023-02-07
    FAM-0867 AB AGRI POLSKA Sp. z o.o. Smigiel POLAND Valid 2020-08-03 2023-03-19
    FAM-1510-02 AB Vista Marlborough UNITED KINGDOM Valid 2020-04-16 2023-04-15
    FAM-1510-01 AB Vista * Rotterdam NETHERLANDS Valid 2020-04-16 2023-04-15
    

Reference

You can find a couple of relevant discussions in:

CodePudding user response:

  1. The table element you are trying to access is inside an iframe. You have to switch to that iframe first in order to access these elements.
  2. You should extract the table texts to print them.
    Try this:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
driver= webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
wait = WebDriverWait(driver, 20)
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#inlineFrameExample")))
table = wait.until(EC.visibility_of_element_located((By.XPATH, "//tbody/tr")))
print(table.text)
  • Related