Home > other >  Get the proper href link using BeautifulSoup
Get the proper href link using BeautifulSoup

Time:04-20

I am writing a web scraper and am struggling to get the href link from a web page. The URL is https://vcnewsdaily.com/tessera-therapeutics/venture-capital-funding/rsgclpxrcp. I am trying to get the href link below:

<div >
             <a href="https://vcnewsdaily.com/Tessera Therapeutics/venture-funding.php"> &gt;&gt; Click here for more funding data on Tessera Therapeutics</a>
             </div>

Here is my code:

from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://vcnewsdaily.com/tessera-therapeutics/venture-capital-funding/rsgclpxrcp"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []

for link in soup.findAll(class_='mb-2'):
    links.append(link.get('href'))
print(links)

When I run the code, it outputs:

[None, None, None, None]

Can someone guide me in the right direction?

CodePudding user response:

The variable link doesn't contain <a> tag with href= attribute. To select all <a> under tags with class .mb-2 you can use for example CSS selector:

import requests
from bs4 import BeautifulSoup

URL = "https://vcnewsdaily.com/tessera-therapeutics/venture-capital-funding/rsgclpxrcp"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []

for link in soup.select(".mb-2 a"):  # <-- select <a> tags here
    links.append(link.get("href"))
print(links)

Prints:

['https://vcnewsdaily.com/Tessera Therapeutics/venture-funding.php', 'https://vcnewsdaily.com/marketing.php']

CodePudding user response:

Your code almost works, just use find instead of get and search for a:

from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://vcnewsdaily.com/tessera-therapeutics/venture-capital-funding/rsgclpxrcp"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []

for link in soup.findAll(class_='mb-2'):
    links.append(link.find('a'))
print(links)
  • Related