With the following code:
data = driver.find_elements(By.XPATH, '//div[@]/span[@]')
I got those html codes below:
<span >
<span >Anonymous</span>
<span >(ID:
<span title="Highlight posts by this ID" style="background-color: rgb(228, 51,
138); color: white;">RDS8pJvL</span>)</span>
<span title="United States" ></span>
</span>
And
<span >
<span >Pierre</span>
<span >!AYZrMZsavE</span>
<span >(ID:
<span title="Highlight posts by this ID"
style="background-color: rgb(136, 179, 155); color: black;">y5EgihFc</span>)</span>
<span title="Australia" ></span>
</span>
Now I need to get the "countries" => "United States" and "Australia".
With the whole dataset (more than 120k entries), I was doing:
for i in data:
country = i.find_element(By.XPATH, './/span[contains(@class,"flag")]').get_attribute('title')
But after a while I got empty entries and I figured out than sometime the class of the country was completely changing from "flag something" to "bf something" or "cd something"
This is why I decided to go with the last children for each element:
for i in data:
country = i.find_element(By.XPATH, './/span[3]').get_attribute('title')
But again, after a while I got error again because sometime there were some <span >BLABLA</span>
popping, moving the "country" location to "span[4]".
So, I changed for the following one:
for i in data:
country = i.find_element(By.XPATH, './/span[last()]').get_attribute('title')
But this last one always give me the second level child (posteruid child):
<span title="Highlight posts by this ID"
style="background-color: rgb(136, 179, 155); color: black;">y5EgihFc</span>)
One thing that I'm certain: the country is ALWAYS the last child (span) of the first level of children.
So I'm out of ideas this is why I'm asking you this question.
CodePudding user response:
For this particular case, you can get the titles without calculating the child nodes. Just keep the nameBlock as root and create the xpath to point to the child which class will have the title ( flag, in this case). Like this:
//span[@class='nameBlock']/span[contains(@class,'flag')]
CodePudding user response:
Use the following xpath
to always identify the last child of parent.
(//span[@class='nameBlock']//span[@title])[last()]
Code block.
for country in driver.find_elements(By.XPATH, "(//span[@class='nameBlock']//span[@title])[last()]"):
print(country.get_attribute("title"))