I'm using selenium to scrape a web page to get product model numbers. The page has two sections of a grid of products with a card between the two sections. I can grab the model numbers from the first section from "browse-search-pods-1" but I can't access the elements on the bottom half of the page from the second section after "browse-search-pods-2". It ignores the second section. There are 24 products but it only grabs the first 12 from the first section. How can I access both sections?
Here's the website: https://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts
Here's a sample of the html for one product:
<div >
<section id="browse-search-pods-1" >
<div data-lg-name="Product Pod: 0">
<div data-automation-id="podnode" data-type="product">
<div >
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" target="_blank" rel="noopener noreferrer" >More Options</a>
<div >
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" >
<div >
<h2 ><span >USG Sheetrock Brand</span><span >1/2 in. x 4 ft. x 8 ft. UltraLight Drywall</span></h2>
</div>
</a>
</div>
<div >
<div >
<div >Model# 14113411708</div>
</div>
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div >
<div title=""><span style="width:89.80600000000001%"></span></div>
<span >
(<!-- -->3753<!-- -->)
</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
<section id="browse-search-pods-2" >
<div data-lg-name="Product Pod: 0">
<div >
<section >
<div >
<h2 >Project Guide</h2>
<p >Installing Drywall Project Guide</p>
</div>
<div >
<div ><img src="https://www.homedepot.com/hdus/en_US/DTCCOMNEW/fetch/FetchRules/FetchPN/how-to-install-drywall-professional-steps-HT-PG-BM.jpg" alt="" height="1" width="1" loading="lazy"></div>
<div >
<div >Hanging drywall is not difficult if you have patience, the right tools and a friend to help. Follow our instructions to learn more</div>
<div ><a href="//www.homedepot.com/c/how_to_install_drywall_professional_steps_HT_PG_BM"><span >Read Our Guide</span></a></div>
</div>
</div>
</section>
<section >
<div >
<h2 >Buying Guide</h2>
<p >Types of Drywall</p>
</div>
<div >
<a href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">
<div style="background-image: url("https://i3.ytimg.com/vi/4hF9_z3IqaA/mqdefault.jpg");"></div>
</a>
</div>
<a href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">See Our Tips</a>
</section>
</div>
</div>
<div >
<div data-automation-id="podnode" data-type="product">
<div >
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" target="_blank" rel="noopener noreferrer" >More Options</a>
<div >
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" >
<div >
<h2 ><span >Westpac Materials</span><span >18 lb. Fast Set 20 Lite Setting-Type Joint Compound</span></h2>
</div>
</a>
</div>
<div >
<div >
<div >Model# 22165H</div>
</div>
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div >
<div title=""><span style="width: 94.16%;"></span></div>
<span >(226)</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
</div>
Here's the code I've tried to access the second section but I get the model numbers from the first:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get('https://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts')
section_two = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
product_model = section_two.find_elements(By.XPATH, "//div[contains(@class, 'product-identifier product-identifier__model')]")
for model in product_model:
print(model.text)
CodePudding user response:
Try scrolling to the element browse-search-pods-2
and then do
section_two = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
For scrolling you can try:
org.openqa.selenium.interactions.Actions
are reflected in ActionChains
class:
from selenium.webdriver.common.action_chains import ActionChains
element = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
Or, you can also "scroll into view" via scrollIntoView()
:
driver.execute_script("arguments[0].scrollIntoView();", element)