Hello Everyone I am trying to scrap a table from a Javascript based website. It is however quite strange as the table is split up into different table tags. I cannot share the website as its on an internal server but have attached some html code below:
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div role="row" aria-selected="false" idref="admin" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">admin</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="locked">
<div ></div>
Locked
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">08/07/13 07:17:49 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user1" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user1</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="connected">
<div ></div>
Connected (2)
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">07/04/22 03:37:32 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user2" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user2</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">06/27/22 09:55:30 AM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user3" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user3</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">12/18/19 03:56:05 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user4" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user4</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/20/22 05:49:45 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user5" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user5</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/19/22 12:16:31 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user6" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user6</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">07/01/22 03:24:16 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="secadmin" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">secadmin</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="locked">
<div ></div>
Locked
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="security_admin">Security administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">06/07/21 03:28:40 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="tpcuser" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">tpcuser</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="monitor">Monitor</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">PUBLIC</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">03/03/21 06:00:33 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user6" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user6</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/10/22 12:39:54 PM</td>
</tr>
</tbody>
</table>
</div>
</div>
My Code:
for i in range(10):
print(i)
Username = driver.find_elements(By.XPATH, value="//*[@id='dojox_grid_View_1']/div/div/div/div[" str(i) "]/table/tbody/tr/td[1]")
print(Username)
I am confused on how to loop through correctly as the table's XPATH are as follows:
admin = //*[@id='dojox_grid_View_1']/div/div/div/div[1]/table/tbody/tr/td[1]
User1 = //*[@id='dojox_grid_View_1']/div/div/div/div[2]/table/tbody/tr/td[1]
User2 = //*[@id='dojox_grid_View_1']/div/div/div/div[3]/table/tbody/tr/td[1]
etc.
I want the output to be in a list e.g.
[admin, user1, user2, user3, user4,]
I am so sorry if this question is confusing, I wrote it as best as I could
CodePudding user response:
- Find all tables.
tables = driver.find_elements(By.XPATH, '//table[@]')
For loop these tables find first td.
(
tables[1:]
will skip first table which are column names wrap by<th>
)
print([table.find_element(By.XPATH, './tbody/tr/td[1]').text for table in tables[1:]])
/------test-------/
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('file:///D:/Documents/python/projects/test/test2.html')
tables = driver.find_elements(By.XPATH, '//table[@]')
res = []
for table in tables:
try:
res.append(table.find_element(By.XPATH, './tbody/tr/td[1]').text)
except:
pass
print(res)
output
['admin', 'user1', 'user2', 'user3', 'user4', 'user5', 'user6', 'secadmin', 'tpcuser', 'user6']