Home > Enterprise >  Extracting multiple text using partial href information
Extracting multiple text using partial href information

Time:04-19

I'm trying to extract multiple genre from below site. (I already know the URLs) https://www.discogs.com/master/1515454-Zedd-Katy-Perry-365

<div >
  <h1 id="profile_title" >...<h1>
  <div >Genre:<div> ==$0
  <div >
    <a href="/genre/electronic">Electronic</a>
    ", "
    <a href="/genre/pop">Pop</a>


And here's my Python code

genre = None
try:
  genre = driver.find_element_by_xpath("[contains(concat(' ', @class, ' '), ' profile ')]//*[contains(@href, ' /genre/* '").text

How do I extract genres to text? (e.g. Electronic, Pop)

CodePudding user response:

To extract and print the values of Genre i.e. Electronic, Pop, etc within the website you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using XPATH:

    driver.get("https://www.discogs.com/master/1515454-Zedd-Katy-Perry-365")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr/th[@scope='row' and contains(., 'Genre')]//following::td[1]//a")))])
    
  • Console Output:

    ['Electronic', 'Pop']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related