Home > Blockchain >  How to make a webpage stop loading and extract text from it
How to make a webpage stop loading and extract text from it

Time:03-07

I want to extract the text from a url-shortner using this code :


    import os
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    
    os.environ['PATH']  = 'C:/Selenium Drivers'
    driver = webdriver.Chrome()
    driver.implicitly_wait(10)
    driver.get('https://pastebin.com/vkuagfwV')
    strings = str(driver.find_element(By.CLASS_NAME, 'textarea').text)
    strings = strings.replace("\n", " ")
    driver.close()
    
    print(strings)

But this code is not working until I manually stop the web-page from stop loading. I tried using XPATH as well but it didn't work.

CodePudding user response:

Instead of implicitly_wait try using Expected Conditions visibility_of_element_located method here.
Also as mentioned in comments you don't need to use str casting there.
Please try this:

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
    
os.environ['PATH']  = 'C:/Selenium Drivers'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)

driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text

strings = strings.replace("\n", " ")
driver.close()
    
print(strings)

UPD
Please add eager pageLoadStrategy configuration.

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

    
os.environ['PATH']  = 'C:/Selenium Drivers'
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')

#driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)

driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text

strings = strings.replace("\n", " ")
driver.close()
    
print(strings)
  • Related