I use Selenium in Python for scraping. I can't get values though these values are displayed on the browser.
So I checked the HTML source code, then I found that there are no values in HTML as below.
HTML
<div id="pos-list-body" >
</div>
But there are values when I checked developer tool in chrome.
DevTools
<div id="pos-list-body" >
<div id="pos-row-1">
<div >
<input type="checkbox" value="1">
</div>
<div >
1
</div>
<div >
a
</div>
...
</div>
<div id="pos-row-2">
<div >
<input type="checkbox" value="2">
</div>
<div >
2
</div>
<div >
b
</div>
...
</div>
...
</div>
It seems that these values generated by JavaScript or something.
There is no iframe
in sorce code.
How can I get these values with python?
It would be appreciated if you could give me some hint.
CodePudding user response:
If ID pos-list-body
is unique in HTML-DOM
, then your best bet is to use explicit wait
with innerText
Code:
wait = WebDriverWait(driver, 20)
print(wait.until(EC.presence_of_element_located((By.ID, "pos-list-body"))).get_attribute('innerText'))
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
Element.outerHTML
The outerHTML
attribute of the Element gets the serialized HTML fragment describing the element including its descendants. It can also be set to replace the element with nodes parsed from the given string. However to only obtain the HTML representation of the contents of an element ideally you need to use the innerHTML
property instead. So reading the value of outerHTML
returns a DOMString containing an HTML serialization of the element and its descendants. Setting the value of outerHTML replaces the element and all of its descendants with a new DOM tree constructed by parsing the specified htmlString.
Solution
To get the html generated by JavaScript you can use the following solution:
print(driver.execute_script("return document.getElementById('pos-list-body').outerHTML"))