Home > front end >  Selenium - How to get the text from an element but retaining child element source
Selenium - How to get the text from an element but retaining child element source

Time:01-24

Using Python 3 and Selenium 4.8.0.

Suppose I have

<p>
    I love <i>pizza</i>.
</p>

Having done

elem = driver.find_element(By.TAG_NAME, "p")

elem.text will contain "I love pizza."

What I want, however, is to somehow retain the information of what text is italicized such that I can automatically generate a .tex file containing, e.g.

I love \textit{pizza}.

In simple cases, one option would be to find the child <i> element and use string replace methods, but this leads to obvious problems if the child text is contained elsewhere in elem, e.g. <p>I love <i>love</i> pizza.</p>.

How might I get around this?

Update: Ultimately the LaTeX (like the one in the question), but all I really need help with is getting to some intermediate step such as ["I love ", "pizza", "."] where I know that it alternates between italicized or not, or even just getting the text back as something like "I love pizza." would be great.

CodePudding user response:

To extract the text I love <i>pizza</i> instead of the text attribute, you need innerHTML as follows:

print(driver.find_element(By.TAG_NAME, "p").get_attribute("innerHTML"))
  • Related