Home > Mobile >  BeautifulSoup4 find multiple href's links with specific text in links
BeautifulSoup4 find multiple href's links with specific text in links

Time:11-11

I'm trying filter all href links with the string "3080" in it, I saw some examples, but I just can't apply them to my code. Can someone tell me how to print only the links.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import driver_functions

gpu = '3080'
url = f'https://www.alternate.de/listing.xhtml?q={gpu}'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

if __name__ == '__main__':
    browser = webdriver.Chrome(options=options, service=Service('chromedriver.exe'))
    try:

        browser.get(url)

        time.sleep(2)

        html = browser.page_source

        soup = BeautifulSoup(html, 'html.parser')

        gpu_list = soup.select("a", class_="grid-container listing")

        for link in gpu_list:
            print(link['href'])

        browser.quit()
    except:
        driver_functions.browserstatus(browser)

Output

CodePudding user response:

You could use a css attribute = value css selector with * contains operator to target hrefs, within the listings, that contain that gpu variable. You can obviously develop this css selector list if you find edge cases to account for. I only looked at the url given.

gpu_links= [i['href'] for i in soup.select(f".listing [href*='{gpu}']")]

CodePudding user response:

Try this as your selector gpu_list = soup.select('#lazyListingContainer > div > div > div.grid-container.listing > a')

  • Related