webdriver : can't get the broken links-CodePudding

So I can get url with

driver.get('https://www.w3.org/')

But what I want to test is, if I give a fault link, I should get something like

This page does not exist.

But when I try to capture this, I can't get the result

This is failed, can't report the fault link

link = "https://www.w3.org/fault_link"

if driver.find_elements_by_xpath("//*[contains(text(), 'This page does not exist')]"):
    logger.info("Found fault link %s", link)

this is failed as well, can't capture it.

element = driver.find_element(
                    By.XPATH, '//*[@id="__next"]/div[1]/main')

# when I print out the element text, I can see the output
# 404 ERROR
# This page does not exist.
# The page you are looking for could not be found.
# Go back home →
logger.info(element.text)

if e.text=='This page does not exist.':
     logger.info("Found fault link %s", link)

this is failed as well

if search("This page does not exist.", element.text):
    logger.info("Found fault link %s", link)

Any suggestions?

CodePudding user response：

Your test is failing since you expecting to find non-existing text.
This text This page does not exist in not presented on https://www.w3.org/fault_link page.
What you should look for on that specific page is Document not found text.
So, this code is working for that specific page:

url = "https://www.w3.org/fault_link"
driver.get(url)

if driver.find_elements(By.XPATH, "//*[contains(text(), 'Document not found')]"):
    print("Found fault link %s", url)

The output is:

Found fault link %s https://www.w3.org/fault_link

Generally you should understand that each web site will present different error / notification for non-existing page.

CodePudding user response：

My suggestion is to do something like this. Have in mind that I dont program in python, just did a quick search in order to assemble the example:

import requests 
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("http://www.python.org")
assert "Python" in driver.title
elements = driver.find_elements(By.XPATH, "//a")
print(len(elements))
links = [elem.get_attribute('href') for elem in elements]
print(links)
x = requests.get(links[0]) 
print(x.status_code)

I am checking the status code only of the first link found on the page. You can do foreach and if something has status code >= 400 then we are talking about broken link.