Is there any easy way of extracting the text from the HTML source without losing structure (specifically line breaks and spaces).
Currently, I am extracting text as follows:
page_title_element = driver.find_element_by_xpath("x-path")
page_title = page_title_element.text
However, this method distorts the structure of the text.
I am using Python and Selenium.
Edit:
I am essentially trying to extract the data from the whole page (complete text data of HTML pages) and not from individual tags.
CodePudding user response:
Simply you need to access the source of element. This means getting the innerHTML information as they do with JavaScript which doesn't exist in the case of a python code.
Here's how to do it
page_title_element = driver.find_element_by_xpath("x-path")
page_title = page_title_element.source
CodePudding user response:
You have to use below code for that.
data = driver.find_element_by_xpath("//html").get_attribute("innerHTML");