Source website:
<div >
<h1>Example 1</h1>
<p>Example 2</p>
<h3>Exmaple 3</h3>
</div>
My Code:
content=driver.find_elements(By.XPATH,'//div[@id="content"]/h1')
full_content=""
for des in content:
full_content ='\n\n' des.text
suggest=[page_link,full_content]
print(suggest)
I don't want to scrape everything from inside the 'content' class, only text from certain tags like h1 h3, but i want all that within full_content
. Can i do it with selenium?
CodePudding user response:
Not id
, if your html doc example is correct then //div[@]/h1
is also correct.
content=driver.find_elements(By.XPATH,'//div[@]/h1')
CodePudding user response:
The <h1>
element is the immediate descendant of it's parent <div>
So to scrape the texts from certain tags you can use either of the following Locator Strategies:
innerText from
<h1>
tag:print(driver.find_element(By.XPATH, "//div[@class='content']/h1").text)
innerText from
<h3>
tag:print(driver.find_element(By.XPATH, "//div[@class='content']//h3").text)