I have this html that I am trying to scrape from surfing heats
<div >
<div id="heat-85940" >
<div id="heat-85941" >
<div id="heat-85942" >
<div id="heat-85943" >
<div>
I have a loop created to scrape the heats on each page, but because the heat ID is changing on each page (ie not always starting at 85940), I can only get 1 page worth without manually changing the range i loop.
For one page, my code looks like this:
heat_count = len(driver.find_elements(By.CLASS_NAME, 'new-heat-hd-name').text)
for h in range(heat_count):
for i in range(4):
name = driver.find_element(By.XPATH, f'//*[@id="heat-8594{h}"]/div/div[2]/div[{i 1}]/div[1]/div[1]/div/div[2]/div[1]/span').text
I'm looking for a way to search within the html to find heat-85940
and then start from there instead of manually finding it for each page.
CodePudding user response:
You can try this:
Here I am writing only the starting section of the XPath - ie, how to handle the dynamic value 'id="heat-85940"', pls fill in the remaining XPath, because you didn't post the URL and full HTML source.
driver.find_element(By.XPATH, ".//*[starts-with(@id,'heat-')]...<remaining XPath until the element>")
or
driver.find_element(By.XPATH, ".//*[starts-with(@id,'heat-8594')]...<remaining XPath until the element>")
CodePudding user response:
You can try something like
# heats = driver.find_elements(By.XPATH, '//*[starts-with(@id,"heat-")]')
heats = driver.find_elements(By.CSS_SELECTOR, '*[id^="heat-"]')
for heat in heats:
names = heat.find_elements(By.XPATH, '/div/div[2]/div/div[1]/div[1]/div/div[2]/div[1]/span')
for n in names[:4]:
name = n.text
or
# heats = driver.find_elements(By.XPATH, '//*[starts-with(@id,"heat-")]')
heats = driver.find_elements(By.CSS_SELECTOR, '*[id^="heat-"]')
for heat in heats:
for i in range(4):
name = heat.find_element(By.XPATH, f'/div/div[2]/div[{i 1}]/div[1]/div[1]/div/div[2]/div[1]/span').text
(I can't test these without more of your html so I'm not quite confident in any of them.)