I am trying to scrape the data in a bunch of rows. I am able to expand an individual row using the following:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="7858101"]'))).click()
The problem is each row has a different id. They have common class name so I have also tried:
WebDriverWait(driver, 60).until(EC.presence_of_elements_located((By.CLASS_NAME, 'course-row normal faculty-BU active'))).click()
I have attached a few rows below Any suggestions on how I can fix this
<tr id="7858101" data-cid="7858101" data-cc="ACTG1P01" data-year="2021" data-session="FW" data-type="UG" data-subtype="UG" data-level="Year1" data-fn2_notes="BB" data-duration="2" data-class_type="ASY" data-course_section="1" data-days=" " data-class_time="" data-room1="ASYNC" data-room2="" data-location="ASYNC" data-location_desc="" data-instructor="Zhang, Xia (Celine)" data-msg="0" data-main_flag="1" data-secondary_type="E" data-startdate="1631073600" data-enddate="1638853200" data-faculty_code="BU" data-faculty_desc="Goodman School of Business">
<td ><span ></span></td>
<td >ACTG 1P01 </td>
<td ><a href="#" data-cc="ACTG1P01" data-cid="7858101">Introduction to Financial Accounting</a> <div style="display: none;"><span ></span></div></td>
<td >D2</td>
<td > </td>
<td > </td>
<!-- <td data-sort-value="1631073600">Sep 08, 2021</td> -->
<!-- <td data-sort-value="1638853200">Dec 07, 2021</td> -->
<td >ASY</td>
<td ><div style="" >
<div >
<h3>Introduction to Financial Accounting</h3>
<p >Fundamental concepts of financial accounting as related to the balance sheet, income statement and statement of cash flows. Understanding the accounting cycle and routine transactions. Integrates both theoretical and practical application of accounting concepts.</p>
<p><strong>Format:</strong> Lectures, discussion, 3 hours per week.</p>
<p><strong>Restrictions:</strong> open to BAcc majors.</p>
<p><strong>Exclusions:</strong> Completion of this course will replace previous assigned grade and credit obtained in ACTG 1P11, 1P91 and 2P51.</p>
<p><strong>Notes:</strong> Open to Bachelor of Accounting majors. </p>
</div>
<div >
<ul>
<li><strong>Duration:</strong> Sep 08, 2021 to Dec 07, 2021</li>
<li>
<strong>Location:</strong> ASYNC </li>
<li><strong>Instructor:</strong> Zhang, Xia (Celine)</li>
<li><strong>Section:</strong> 1</li>
</ul>
</div>
<hr>
</div>
</td>
</tr>
<tr id="3724102" data-cid="3724102" data-cc="ACTG1P01" data-year="2021" data-session="FW" data-type="UG" data-subtype="UG" data-level="Year1" data-fn2_notes="BB" data-duration="2" data-class_type="LEC" data-course_section="2" data-days=" M R " data-class_time="1100-1230" data-room1="GSB306" data-room2="" data-location="GSB306" data-location_desc="" data-instructor="Zhang, Xia (Celine)" data-msg="0" data-main_flag="1" data-secondary_type="E" data-startdate="1631073600" data-enddate="1638853200" data-faculty_code="BU" data-faculty_desc="Goodman School of Business">
<td ><span ></span></td>
<td >ACTG 1P01 </td>
<td ><a href="#" data-cc="ACTG1P01" data-cid="3724102">Introduction to Financial Accounting</a> <div ><span ></span></div></td>
<td >D2</td>
<td >
<table >
<thead>
<tr>
<th >S</th>
<th >M</th>
<th >T</th>
<th >W</th>
<th >T</th>
<th >F</th>
<th >S</th>
</tr>
</thead>
<tbody>
<tr>
<td ></td>
<td ></td>
<td ></td>
<td ></td>
<td ></td>
<td ></td>
<td ></td>
</tr>
</tbody>
</table>
</td>
<td >1100-1230</td>
<!-- <td data-sort-value="1631073600">Sep 08, 2021</td> -->
<!-- <td data-sort-value="1638853200">Dec 07, 2021</td> -->
<td >LEC</td>
<td ></td>
</tr>
CodePudding user response:
Are almost there...
You can retrieve a list of all the relevant web elements with the use of driver.find_elements
method and then to iterate over each element in the list clicking on it.
Since course-row normal faculty-BU active
is actually several class names, not a single class name, you should use XPath or CSS Selector there.
Also it's recommended to use visibility_of_element_located
expected condition here, not presence_of_elements_located
since the former condition is fulfilled even when the web element is not finally rendered on the page while visibility_of_element_located
expected condition waits for more mature state of the web element
WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, '//tr[@class = "course-row normal faculty-BU active"]')))
time.sleep(0.4) #short delay added to make ALL the elements loaded
elements = driver.find_element(By.XPATH, '//tr[@class = "course-row normal faculty-BU active"]')
for element in elements:
element.click()
#scrape the data you need here etc
CodePudding user response:
As the id
attributes of the <tr>
have dynamic value to identify all the <tr>
s and click on each of them you need to induce WebDriverWait for the visibility_of_all_elements_located() and you need to construct a dynamic locator strategy as follows:
Using CSS_SELECTOR:
elements = WebDriverWait(driver, 60).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "tr.course-row.normal.faculty-BU.active[data-faculty_desc='Goodman School of Business'] a[data-cc][data-cid]"))) for element in elements: element.click()
Using XPATH:
elements = WebDriverWait(driver, 60).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[@class='course-row normal faculty-BU active' and @data-faculty_desc='Goodman School of Business']//a[@data-cc and @data-cid]"))) for element in elements: element.click()