Using Python 3 and Selenium 4.8.0.
Suppose I have
<p>
I love <i>pizza</i>.
</p>
Having done
elem = driver.find_element(By.TAG_NAME, "p")
elem.text
will contain "I love pizza."
What I want, however, is to somehow retain the information of what text is italicized such that I can automatically generate a .tex
file containing, e.g.
I love \textit{pizza}.
In simple cases, one option would be to find the child <i>
element and use string replace methods, but this leads to obvious problems if the child text is contained elsewhere in elem
, e.g. <p>I love <i>love</i> pizza.</p>
.
How might I get around this?
Update: Ultimately the LaTeX (like the one in the question), but all I really need help with is getting to some intermediate step such as ["I love ", "pizza", "."] where I know that it alternates between italicized or not, or even just getting the text back as something like "I love pizza." would be great.
CodePudding user response:
To extract the text I love <i>pizza</i>
instead of the text attribute, you need innerHTML
as follows:
print(driver.find_element(By.TAG_NAME, "p").get_attribute("innerHTML"))