<body>
<tbody id="data-table">
<tr>
<td>
</td>
<td>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
<td>
</td>
<td>
</td>
</tr>
</tbody>
</body>
I need a fast way to find the texts contained within each <td>
I tried
main_table = driver.find_element(By.ID, "data-table")
for i in range(3):
main_table.find_element(By.XPATH, "tr[" str(i 1) "]/td[1]").text
main_table.find_element(By.XPATH, "tr[" str(i 1) "]/td[2]").text
main_table.find_element(By.XPATH, "tr[" str(i 1) "]/td[3]").text
this is incredibly slow... nearly 200ms for each search
this simple loop takes over 3 x 3 x 200 ms or 1.8 sec
the actual data I need to extract is even bigger, its over 100 tr
and each having 5 td
this takes over 100 secs to complete
is there a faster way to do this?
I was wondering if there is a way to just extract all the tags under the main table for example
extracted_data = main_table.get_all_tags()
for tr in extracted_data:
for td in tr:
print(td.text)
the idea is we extract all the sub-tags data and then use pure python to further extract the sub data instead of crawling it using find_element
CodePudding user response:
Try:
for tr in driver.find_elements(By.XPATH, '//*[@id="data-table"]/tr'):
td1= tr.find_element(By.XPATH,'.//td[1]').text
td2= tr.find_element(By.XPATH,'.//td[2]').text
td3= tr.find_element(By.XPATH,'.//td[3]').text
CodePudding user response:
If you're just looking for the text in each td tag you could do:
main_table = driver.find_element(By.XPATH, '//*[@id="data-table"]/tr/td')
for xin main_table:
print(x.text)
CodePudding user response:
Identify the table element and get the outerHtml
of the table element first, Then use pandas to read the html
main_table = driver.find_element(By.XPATH, "//table[.//tbody[@id='data-table']]").get_attribute("outerHtml")
df=pd.read_html(main_table)[0]
print(df)
Import following library
import pandas as pd
If pandas,not installed then install it first
pip install pandas