Home > Back-end >  How to get specific text hyperlinks in the home webpage by BeautifulSoup?
How to get specific text hyperlinks in the home webpage by BeautifulSoup?

Time:03-19

I want to search all hyperlink that its text name includes "article" in https://www.geeksforgeeks.org/ for example, on the bottom of this webpage

Write an Article
Improve an Article

I want to get them all hyperlink and print them, so I tried to,

from urllib.request import urlopen
from bs4 import BeautifulSoup
import os
import re

url = 'https://www.geeksforgeeks.org/'

reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, "html.parser")
links = []
for link in soup.findAll('a',href = True):
    #print(link.get("href")

    if re.search('/article$', href):
        links.append(link.get("href"))

However, it get a [] in result, how to solve it?

CodePudding user response:

Here is something you can try: Note that there are more links with the test article in the link you provided, but it gives the idea how you can deal with this.

In this case I just checked if the word article is in the text of that tag. You can use regex search there, but for this example it is an overkill.

import requests
from bs4 import BeautifulSoup

url = 'https://www.geeksforgeeks.org/'
res = requests.get(url)

if res.status_code != 200:
    'no resquest'

soup = BeautifulSoup(res.content, "html.parser")

links_with_article = soup.findAll(lambda tag:tag.name=="a" and "article" in tag.text.lower())

EDIT:

If you know that there is a word in the href, i.e. in the link itself:

soup.select("a[href*=article]")

this will search for the word article in the href of all elements a.

Edit: get only href:

hrefs = [link.get('href') for link in links_with_article]
  • Related