I'm trying filter all href links with the string "3080" in it, I saw some examples, but I just can't apply them to my code. Can someone tell me how to print only the links.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import driver_functions
gpu = '3080'
url = f'https://www.alternate.de/listing.xhtml?q={gpu}'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
if __name__ == '__main__':
browser = webdriver.Chrome(options=options, service=Service('chromedriver.exe'))
try:
browser.get(url)
time.sleep(2)
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
gpu_list = soup.select("a", class_="grid-container listing")
for link in gpu_list:
print(link['href'])
browser.quit()
except:
driver_functions.browserstatus(browser)
CodePudding user response:
You could use a css attribute = value css selector with * contains operator to target href
s, within the listings, that contain that gpu
variable. You can obviously develop this css selector list if you find edge cases to account for. I only looked at the url given.
gpu_links= [i['href'] for i in soup.select(f".listing [href*='{gpu}']")]
CodePudding user response:
Try this as your selector gpu_list = soup.select('#lazyListingContainer > div > div > div.grid-container.listing > a')