I am trying to find the location of tables in a webpage where I do not have the ID/XPATH/CLASSNAME
of the table. I am using similarity between the table I want and the tables present in the webpage. I am getting incorrect location and size of table when I use element.size
/ element.location
. Any solution or anything am I doing wrong in the following:
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' X)
driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
Using this code I am able to locate the correct/desired table but the location and size of the element returned is incorrect.
CodePudding user response:
I think it took a long time to load the table.
Because Selenium is a dynamic web page automation framework, it can address this problem.
I'll tell you my know-how.
time.sleep()
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' X)
driver.set_window_size(1024, fn('Height'))
time.sleep(10) # <------------------------------------------------
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
time.sleep(10) # <------------------------------------------------
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
location_once_scrolled_into_view
You can try scrolling the page to the table before trying to get its location and size.
table.location_once_scrolled_into_view
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
## remove
# fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' X)
# driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
table.location_once_scrolled_into_view # <-----------------------
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
use not headless mode
You should know that the location
and size
of an element in a headless browser may differ from that of a non-headless browser.
chrome_options = Options()
## remove
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' X)
driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
Do your best.
If you've used all the ways, but they don't work out, try to adjust them while extracting the size yourself.
Hope this helps.