I have a program that takes the text from a website using this following code:
import selenium
driver = selenium.webdriver.Chrome(executable_path=r"\chromedriver.exe")
def get_raw_input(link_input, website_input, driver):
driver.get(f'{website_input}')
try:
here_button = driver.find_element_by_xpath('/html/body/div[2]/h3/a')
here_button.click()
raw_data = driver.find_element_by_xpath('/html/body/pre').text
except:
move_on = False
while move_on == False:
try:
raw_data = driver.find_element_by_class_name('output').text
move_on == True
except:
pass
driver.close()
return raw_data
the section of text it is targeting,is formatted like so
englishword tab frenchword
however, the return I get is in this format:
englishword space frenchword
the english part of the text could be a phrase with spaces in it, I cannot simply .split(" ")
since it may split the phrase as well.
My end goal is to keep the formatting using tab instead of space so I can .split("\t")
to make things easier for later manipulation.
Any help would be greatly appreciated :)
CodePudding user response:
Selenium returns element text in the way how browser renders it. So it typically "normalizes" whitespaces (all inner space symbols turn into a single space).
You can see some discussion here. The solution to get the actually spaced text suggested by Selenium guys is to query textContent
property from element.
Here is the example:
raw_data = driver.find_element_by_class_name('output').get_property('textContent')