I'm trying to scrape multiple pages they have some measures but they don't have the same order in all pages so i have to check in every page which measure is that.. so i've tried to get the parent node of the following text : SO,NO and CO to check which element is that and then put it in the right place from the following html document:
<ul >
<li>
<p >SO₂</p>
<h1 >0.00</h1>
<strong>ppb</strong><p>2022/06/13 07:00</p>
</li>
<li>
<p >NO₂</p>
<h1 >1.00</h1>
<strong>ppb</strong><p>2022/06/26 20:00</p>
</li>
<li>
<p >CO</p>
<h1 >0.00</h1>
<strong>ppb</strong>
<p>2021/07/07 04:00</p>
</li>
</ul>
i've tried something like this: '''
elements_name = ['PM10','PM2.5',"PM1","CO","SO","O","NO"]
for element in elements_name:
driver.find_element_by_xpath(f"//ul[@class='sc-hHftDr gOAyWd']//li//p[contains(.,
{element})]").find_element_by_xpath("parent::node()").find_element_by_css_selector('h1[]').text.strip())
but the problem is that parent::node() pulls the 'SO' element for every element_name each time, it does not get the right parent of the node I also tried
('..') and ('parent::li')
CodePudding user response:
I think the contains()
function will return an unexpected result at least sometimes, because e.g. contains('SO₂', 'O')
is true, and so is contains('PM10', 'PM1')
. I think you should just use the =
operator instead of contains()
.
You should be able to use a single XPath expression. Something like this:
driver.find_element_by_xpath(
f"//ul[@class='sc-hHftDr gOAyWd']"
f"/li[p[@class='sc-bkzZxe card__subtitle']='{element}')]"
f"/h1[@class='sc-idOhPF ipvImd card__highlight-text']"
).text.strip())
=
- search the entire document for the
ul
, - select the child
li
whose subtitle exactly matches (not contains!) the element parameter, - return the
h1
child of thatli
.