Home > Enterprise >  Scrapping name from JavaScript based table using Selenium
Scrapping name from JavaScript based table using Selenium

Time:07-07

Hello Everyone I am trying to scrap a table from a Javascript based website. It is however quite strange as the table is split up into different table tags. I cannot share the website as its on an internal server but have attached some html code below:

<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
   <div  role="row" aria-selected="false" idref="admin" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">admin</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="locked">
                     <div ></div>
                     Locked
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">08/07/13 07:17:49 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user1" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user1</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="connected">
                     <div ></div>
                     Connected (2)
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">07/04/22 03:37:32 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user2" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user2</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">06/27/22 09:55:30 AM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user3" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user3</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">12/18/19 03:56:05 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user4" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user4</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/20/22 05:49:45 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user5" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user5</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/19/22 12:16:31 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user6" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user6</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">07/01/22 03:24:16 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="secadmin" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">secadmin</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="locked">
                     <div ></div>
                     Locked
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="security_admin">Security administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">06/07/21 03:28:40 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="tpcuser" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">tpcuser</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="monitor">Monitor</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">PUBLIC</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">03/03/21 06:00:33 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
   <div  role="row" aria-selected="false" idref="user6" style="">
      <table  border="0" cellspacing="0" cellpadding="0" role="presentation">
         <tbody>
            <tr>
               <td tabindex="-1" role="gridcell"  idx="0" style="width:14em;" hilite="1" fieldname="name">user6</td>
               <td tabindex="-1" role="gridcell"  idx="1" style="width:12em;" hilite="1" fieldname="state">
                  <div  data-state="disconnected">
                     <div ></div>
                     Disconnected
                  </div>
               </td>
               <td tabindex="-1" role="gridcell"  idx="2" style="width:14em;" hilite="1" fieldname="role"><span  tooltipkey="admin">Administrator</span></td>
               <td tabindex="-1" role="gridcell"  idx="3" style="display:none;width:10em;" hilite="1" fieldname="scope">*</td>
               <td tabindex="-1" role="gridcell"  idx="4" style="display:none;width:16em;" hilite="1" fieldname="lastAuthenticatedTime">05/10/22 12:39:54 PM</td>
            </tr>
         </tbody>
      </table>
   </div>
</div>

My Code:

for i in range(10):
    print(i)
    Username = driver.find_elements(By.XPATH, value="//*[@id='dojox_grid_View_1']/div/div/div/div["   str(i)   "]/table/tbody/tr/td[1]")
    print(Username)

I am confused on how to loop through correctly as the table's XPATH are as follows:

admin = //*[@id='dojox_grid_View_1']/div/div/div/div[1]/table/tbody/tr/td[1]
User1 = //*[@id='dojox_grid_View_1']/div/div/div/div[2]/table/tbody/tr/td[1]
User2 = //*[@id='dojox_grid_View_1']/div/div/div/div[3]/table/tbody/tr/td[1]
etc. 

I want the output to be in a list e.g.

[admin, user1, user2, user3, user4,] 

I am so sorry if this question is confusing, I wrote it as best as I could

CodePudding user response:

  1. Find all tables.
tables = driver.find_elements(By.XPATH, '//table[@]')
  1. For loop these tables find first td.

    (tables[1:] will skip first table which are column names wrap by <th>)

print([table.find_element(By.XPATH, './tbody/tr/td[1]').text for table in tables[1:]])

/------test-------/

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get('file:///D:/Documents/python/projects/test/test2.html')
tables = driver.find_elements(By.XPATH, '//table[@]')

res = []
for table in tables:
    try:
        res.append(table.find_element(By.XPATH, './tbody/tr/td[1]').text)
    except:
        pass
print(res)

output

['admin', 'user1', 'user2', 'user3', 'user4', 'user5', 'user6', 'secadmin', 'tpcuser', 'user6']
  • Related