Home > Software engineering >  python selenium find element is slow
python selenium find element is slow

Time:11-02

<body>
    <tbody id="data-table">
       <tr>
          <td>
          </td>
          <td>
          </td>
          <td>
          </td>
          <td>
          </td>
       </tr>
       <tr>
          <td>
          </td>
          <td>
          </td>
          <td>
          </td>
          <td>
          </td>
       </tr>
       <tr>
          <td>
          </td>
          <td>
          </td>
          <td>
          </td>
          <td>
          </td>
       </tr>
    </tbody>
</body>

I need a fast way to find the texts contained within each <td>

I tried

main_table = driver.find_element(By.ID, "data-table")
for i in range(3):
     main_table.find_element(By.XPATH, "tr["   str(i   1)   "]/td[1]").text
     main_table.find_element(By.XPATH, "tr["   str(i   1)   "]/td[2]").text
     main_table.find_element(By.XPATH, "tr["   str(i   1)   "]/td[3]").text

this is incredibly slow... nearly 200ms for each search
this simple loop takes over 3 x 3 x 200 ms or 1.8 sec

the actual data I need to extract is even bigger, its over 100 tr and each having 5 td
this takes over 100 secs to complete

is there a faster way to do this?

I was wondering if there is a way to just extract all the tags under the main table for example

extracted_data = main_table.get_all_tags()
for tr in extracted_data:
    for td in tr:
        print(td.text)   

the idea is we extract all the sub-tags data and then use pure python to further extract the sub data instead of crawling it using find_element

CodePudding user response:

Try:

for tr in driver.find_elements(By.XPATH, '//*[@id="data-table"]/tr'):
    td1= tr.find_element(By.XPATH,'.//td[1]').text
    td2= tr.find_element(By.XPATH,'.//td[2]').text
    td3= tr.find_element(By.XPATH,'.//td[3]').text

CodePudding user response:

If you're just looking for the text in each td tag you could do:

main_table = driver.find_element(By.XPATH, '//*[@id="data-table"]/tr/td')
for xin main_table:
   print(x.text)

CodePudding user response:

Identify the table element and get the outerHtml of the table element first, Then use pandas to read the html

main_table = driver.find_element(By.XPATH, "//table[.//tbody[@id='data-table']]").get_attribute("outerHtml")
df=pd.read_html(main_table)[0]
print(df)

Import following library

import pandas as pd

If pandas,not installed then install it first

pip install pandas
  • Related