I have been trying to scrap a java based website for the table. However the Table data are separated in all different tables. I need to scrap the names and Roles off this table.
I unfortunately cannot give the URL out as its an internal website but I have attached the html code, Names have also been changed for security reasons
This is the website code:
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div role="row" aria-selected="false" idref="admin" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">admin</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="locked">
<div ></div>
Locked
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">08/07/13 07:17:49 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user1" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user1</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="connected">
<div ></div>
Connected (2)
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">07/04/22 03:37:32 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user2" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user2</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">06/27/22 09:55:30 AM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user3" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user3</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">12/18/19 03:56:05 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user4" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user4</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/20/22 05:49:45 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user5" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user5</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/19/22 12:16:31 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user6" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user6</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">07/01/22 03:24:16 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="secadmin" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">secadmin</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="locked">
<div ></div>
Locked
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="security_admin">Security administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">06/07/21 03:28:40 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="tpcuser" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">tpcuser</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="monitor">Monitor</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">PUBLIC</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">03/03/21 06:00:33 PM</td>
</tr>
</tbody>
</table>
</div>
<div role="row" aria-selected="false" idref="user6" style="">
<table border="0" cellspacing="0" cellpadding="0" role="presentation">
<tbody>
<tr>
<td tabindex="-1" role="gridcell" idx="0" style="width:14em;" hilite="1" fieldname="name">user6</td>
<td tabindex="-1" role="gridcell" idx="1" style="width:12em;" hilite="1" fieldname="state">
<div data-state="disconnected">
<div ></div>
Disconnected
</div>
</td>
<td tabindex="-1" role="gridcell" idx="2" style="width:14em;" hilite="1" fieldname="role"><span tooltipkey="admin">Administrator</span></td>
<td tabindex="-1" role="gridcell" idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
<td tabindex="-1" role="gridcell" idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/10/22 12:39:54 PM</td>
</tr>
</tbody>
</table>
</div>
</div>
There is a possibility for future additions to users so I cannot have a set number for the loop.
My code is the following:
#Scrape the user table
Username = driver.find_elements(By.XPATH, value="/html/body/div[7]/div/div[3]/div[1]/div/div/div[2]/div/div[3]/div/div/div/div/div[1]/table/tbody/tr/td[1]")
Role = driver.find_elements(By.XPATH, value="/html/body/div[7]/div/div[3]/div[1]/div/div/div[2]/div/div[3]/div/div/div/div/div[1]/table/tbody/tr/td[3]")
for i in range(len(Username)):
if Username[i].text in userdict:
UserGet = userdict.get(Username[i].text)
print("Company Name|F|DS0000|Company Role|" UserGet "|enabled|||" Role[i].text)
List.append("Company Name|F|DS0000|Company Role|" UserGet "|enabled|||" Role[i].text)
else:
print(Username[i].text " Not in User Dictionary")
I hope my question makes sense and appreciate any help provided
CodePudding user response:
If you correct the XPath values it should give you the expected result. I tested it using this website http://xpather.com/ but I'm not so great with dictionaries in Python so I could not properly test the code. Let me know if there are any issues and I can try to resolve them.
#Scrape the user table
Username = driver.find_elements(By.XPATH, value="//td[@role][1]")
Role = driver.find_elements(By.XPATH, value="//td/span[@tooltipkey]")
for i in range(len(Username)):
if Username[i].text in userdict:
UserGet = userdict.get(Username[i].text)
print("Company Name|F|DS0000|Company Role|" UserGet "|enabled|||" Role[i].text)
List.append("Company Name|F|DS0000|Company Role|" UserGet "|enabled|||" Role[i].text)
else:
print(Username[i].text " Not in User Dictionary")
CodePudding user response:
I just find a newly automation library, it supplies several way to do element navigation, such as parent, child, next_sibling or previous_sibling https://www.clickcorp.com/documents#api/python/webdriver/browser/browsertab/webelement/webelement