Getting list of all the URLs in a Closed Issue page in GitHub using Selenium-CodePudding

I am trying to store the links of all the closed issues from a github (https://github.com/mlpack/mlpack/issues?q=is:issue is:closed) project using selenium. I use the code below:

repo_closed_url = [link.find_element(By.CLASS_NAME,'h4').get_attribute('href') for link in driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]')]

However, the above code only returns the first URL. How can i get all the URLs in that page? I iterate through all the pages. So just getting the links from the first page is fine.

CodePudding user response：

Please try this, this should work:

repo_closed_url = [link.get_attribute('href') for link in driver.find_elements(By.XPATH,"//div[@aria-label='Issues']//a[contains(@class,'h4')]")]

Here //div[@aria-label='Issues']//a[contains(@class,'h4')] Xpath locates directly all the desired title elements on the page.
Then the rest of the code in the line is iterating over the list of returning elements extracting their href attributes as I explained in the previous question

CodePudding user response：

Try the below XPath:

//div[@aria-label='Issues']//a[contains(@id,'issue')]

This XPath will list all the closed issues in page 1. Just use .get_attribute('href') to get the URLs.

CodePudding user response：

To extract the links from all the href attributes you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get("https://github.com/mlpack/mlpack/issues?q=is:issue is:closed")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id^='issue_'] a[id^='issue']")))])

Using XPATH:

driver.get("https://github.com/mlpack/mlpack/issues?q=is:issue is:closed")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id, 'issue_')]//a[starts-with(@id, 'issue')]")))])

Console Output:

['https://github.com/mlpack/mlpack/issues/3371', 'https://github.com/mlpack/mlpack/issues/3370', 'https://github.com/mlpack/mlpack/issues/3369', 'https://github.com/mlpack/mlpack/issues/3368', 'https://github.com/mlpack/mlpack/issues/3367', 'https://github.com/mlpack/mlpack/issues/3365', 'https://github.com/mlpack/mlpack/issues/3364', 'https://github.com/mlpack/mlpack/issues/3363', 'https://github.com/mlpack/mlpack/issues/3356', 'https://github.com/mlpack/mlpack/issues/3353', 'https://github.com/mlpack/mlpack/issues/3352', 'https://github.com/mlpack/mlpack/issues/3351', 'https://github.com/mlpack/mlpack/issues/3348', 'https://github.com/mlpack/mlpack/issues/3340', 'https://github.com/mlpack/mlpack/issues/3338', 'https://github.com/mlpack/mlpack/issues/3336', 'https://github.com/mlpack/mlpack/issues/3333', 'https://github.com/mlpack/mlpack/issues/3329', 'https://github.com/mlpack/mlpack/issues/3326', 'https://github.com/mlpack/mlpack/issues/3325', 'https://github.com/mlpack/mlpack/issues/3324', 'https://github.com/mlpack/mlpack/issues/3323', 'https://github.com/mlpack/mlpack/issues/3319', 'https://github.com/mlpack/mlpack/issues/3314', 'https://github.com/mlpack/mlpack/issues/3303']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC