Home > Net >  How to extract data from a dynamic table with selenium python?
How to extract data from a dynamic table with selenium python?

Time:12-14

I'm trying to extract data from a website. I need to enter the value in the search box and then find the details. it will generate a table. After generating the table, need to write the details to the text file or insert them into a database. I'm trying the following things.

Website: https://commtech.byu.edu/noauth/classSchedule/index.php Search text: "C S 142"

Sample Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.chrome.options import Options
c_options = Options()
c_options.add_experimental_option("detach", True)

s = Service('C:/Users/sidat/OneDrive/Desktop/python/WebDriver/chromedriver.exe')



URL = "http://saasta.byu.edu/noauth/classSchedule/index.php"
driver = webdriver.Chrome(service=s, options=c_options)
driver.get(URL)
element = driver.find_element("id", "searchBar")
element.send_keys("C S 142", Keys.RETURN)
search_button = driver.find_element("id", "searchBtn")
search_button.click()

table = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='sectionTable']")))

rows = table.find_elements("xpath", "//tr")

for row in rows:
    cells = row.find_elements(By.TAG_NAME, "td")
    for cell in cells:
        print(cell.text)

I'm using PyCharm 2022.3 to code and test the result. There is nothing printing with my code. Please help me to solve this problem with to extract data to a text file and to an SQL database table.

CodePudding user response:

Try this:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

c_options = Options()
c_options.add_experimental_option("detach", True) 
s = Service('C:/Users/sidat/OneDrive/Desktop/python/WebDriver/chromedriver.exe')

driver = webdriver.Chrome()
URL = "http://saasta.byu.edu/noauth/classSchedule/index.php"
driver.get(URL)
driver.maximize_window()
element = driver.find_element("id", "searchBar")
element.send_keys("C S 142", Keys.RETURN)
search_button = driver.find_element("id", "searchBtn")
search_button.click()

header = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//table[@id='sectionTable']/thead/tr/th")))
for th in header:
    print(f"{th.get_attribute('textContent')}")


rows = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//table[@id='sectionTable']/tbody/tr")))
for i in range(0, len(rows)):
    cells = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, f"(//table[@id='sectionTable']/tbody/tr)[{i 1}]//td")))
    for cell in cells:
        print(cell.get_attribute('textContent'))

You are waiting for the table, which is correct, but the table is fully loaded (the td are not loaded yet).

    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='sectionTable']//td")))

Then you wait at least for having any content into td element

CodePudding user response:

The following code prints the content of the table you asked for.
You need to wait for elements to be clickable in case you going to click them or send them a text or to wait for visibility in case you want to read their text content.
But this still not present it formatted

from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 30)

url = "http://saasta.byu.edu/noauth/classSchedule/index.php"
driver.get(url)

wait.until(EC.element_to_be_clickable((By.ID, "searchBar"))).send_keys("C S 142", Keys.RETURN)
wait.until(EC.element_to_be_clickable((By.ID, "searchBtn"))).click()

table = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='sectionTable']")))

rows = table.find_elements("xpath", "//tr")

for row in rows:
    cells = row.find_elements(By.TAG_NAME, "td")
    for cell in cells:
        print(cell.text)

The output is:

3 credit hours, 3 class hours a week, and 0 lab hours



C S 142 section 002: C S 142 is no longer being offered. Students in programs that require this course will need to instead take C S 110 or C S 111. Please (1) refer to this document (https://****) for details about these two new courses and (2) talk with the advisors for your program about which one your program will be using in place of C S 142.

Fall, Winter, Spring
002
DAY
Classroom

3.00




TBA
0/0
0

  • Related