Home > Blockchain >  How can I locate the "Copyright" text?
How can I locate the "Copyright" text?

Time:12-25

I'm new in selenium and I'm trying to locate a part of text ("Copyright) from a website. If it exists the program will print "success", otherwise it will print "failed". There's a problem: I don't know how to locate that part of text. There's no class, no id, I don't have the code in CSS. I absolutely know nothing.

Here's the code I wrote, I made a few attempts

import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

PATH = Service("/Users/fscozano/documenti/chromedriver-2.exe")

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://apod.nasa.gov/apod/random_apod.html")

try:
    # search1 = driver.find_element(By.XPATH, "/html/body/center[2]/b[2]")
    copyr = driver.find_element(By.NAME, "Copyright").text
    # search = WebDriverWait(driver, 1).until(

    print(copyr)
    print("success")

except:
    print("failed")
    time.sleep(3)

How should I find "Copyright" if I know nothing and if it's part of a larger text? You can look personally the website and find out where's the problem

CodePudding user response:

There are a few issues with your code/approach.

  1. The text you want is inside an IFRAME. To allow Selenium to see inside an IFRAME, you need to switch context to that IFRAME. Best practice is to wait for the IFRAME to be available and then switch to it. See the docs.

  2. Once you are inside that IFRAME, then you search for the desired element(s). One issue is that you haven't clearly defined what exactly you want as the output. Your code is attempting to retreive the .text from the "Copyright" element (which doesn't exist). Each time the page reloads, a different image is displayed and...

    1. It may or may not be copyrighted
    2. The copyright may or may not be a hyperlink
    3. The copyright may contain one or more links

    To make it all simpler, I'm going to give you code that gets all the text in the CENTER tag that contains the "Copyright" text when it exists e.g.,

    Our Rotating Earth
    Video Credit & Copyright: Bartosz Wojczyński
    
  3. You should use .find_elements instead of try-catch because it doesn't involve throwing and catching exceptions. Exceptions should be exceptional (rare) and not (generally) be used as flow control.

Here's the updated code.

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://apod.nasa.gov/apod/random_apod.html")

wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID,'apod')))
copyr = driver.find_elements(By.XPATH, "//center[.//b[contains(.,'Copyright')]]")
if copyr:
    print(copyr)
    print("success")
else:
    print("failed")

I tried this a number of times and it worked but there may be examples out there that break this also.

CodePudding user response:

  1. You are using a wrong locator
  2. You should add a wait to make this element completely loaded before accessing it
  3. The element you are trying to access is out of the view, you need to scroll the page down
    This should work:
import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.action_chains import ActionChains

PATH = Service("/Users/fscozano/documenti/chromedriver-2.exe")

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)

driver.get("https://apod.nasa.gov/apod/random_apod.html")
credit = wait.until(EC.presence_of_element_located((By.XPATH, "//b[contains(text(),'Credit')]/following-sibling::a")))
actions.move_to_element(credit).perform()
time.sleep(0.5)
credits = driver.find_elements(By.XPATH, "//b[contains(text(),'Credit')]/following-sibling::a")
for credit in credits:
    print(credit.text)
  • Related