Home > OS >  xpath how to get the last value of first level of children in the case of the number of children is
xpath how to get the last value of first level of children in the case of the number of children is

Time:05-19

With the following code:

data = driver.find_elements(By.XPATH, '//div[@]/span[@]')

I got those html codes below:

<span >
  <span >Anonymous</span>
  <span >(ID:
    <span  title="Highlight posts by this ID" style="background-color: rgb(228, 51, 
    138); color: white;">RDS8pJvL</span>)</span>
  <span title="United States" ></span>
</span>

And

<span >
  <span >Pierre</span>
  <span >!AYZrMZsavE</span>
  <span >(ID:
    <span  title="Highlight posts by this ID" 
    style="background-color: rgb(136, 179, 155); color: black;">y5EgihFc</span>)</span>
  <span title="Australia" ></span>
</span>

Now I need to get the "countries" => "United States" and "Australia".

With the whole dataset (more than 120k entries), I was doing:

for i in data:
 country = i.find_element(By.XPATH, './/span[contains(@class,"flag")]').get_attribute('title')

But after a while I got empty entries and I figured out than sometime the class of the country was completely changing from "flag something" to "bf something" or "cd something"

This is why I decided to go with the last children for each element:

for i in data:
 country = i.find_element(By.XPATH, './/span[3]').get_attribute('title')

But again, after a while I got error again because sometime there were some <span >BLABLA</span> popping, moving the "country" location to "span[4]".

So, I changed for the following one:

for i in data:
 country = i.find_element(By.XPATH, './/span[last()]').get_attribute('title')

But this last one always give me the second level child (posteruid child):

 <span  title="Highlight posts by this ID" 
        style="background-color: rgb(136, 179, 155); color: black;">y5EgihFc</span>)

One thing that I'm certain: the country is ALWAYS the last child (span) of the first level of children.

So I'm out of ideas this is why I'm asking you this question.

CodePudding user response:

For this particular case, you can get the titles without calculating the child nodes. Just keep the nameBlock as root and create the xpath to point to the child which class will have the title ( flag, in this case). Like this:

//span[@class='nameBlock']/span[contains(@class,'flag')]

enter image description here

CodePudding user response:

Use the following xpath to always identify the last child of parent.

(//span[@class='nameBlock']//span[@title])[last()]

Code block.

for country in driver.find_elements(By.XPATH, "(//span[@class='nameBlock']//span[@title])[last()]"):
    print(country.get_attribute("title"))
  • Related