Home > Software design >  Extract single URL from webpage by scraping
Extract single URL from webpage by scraping

Time:03-28

I have been trying to scrape a website such as the one below. In the footer there are a bunch of links of their social media out of which the LinkedIn URL is the point of focus for me. Is there a way to fish out only that link maybe using regex or any other libraries available in Python.

This is what I have tried so far -

import requests
from bs4 import BeautifulSoup
url = "https://www.southcoast.org/"
req = requests.get(url)
soup = BeautifulSoup(reqs.text,"html.parser")
for link in soup.find_all('a'):
 print(link.get('href'))

But I'm fetching all the URLs instead of the one I'm looking for.

Note: I'd appreciate a dynamic code which I can use for other sites as well.

Thanks in advance for you suggestion/help. enter image description here

CodePudding user response:

One approach could be to use css selectors and look for string linkedin.com/company/ in values of href attributes:

soup.select_one('a[href*="linkedin.com/company/"]')['href']

Example

import requests
from bs4 import BeautifulSoup
url = "https://www.southcoast.org/"
req = requests.get(url)
soup = BeautifulSoup(req.text,"html.parser")

# single (first) link
link = e['href'] if(e := soup.select_one('a[href*="linkedin.com/company/"]')) else None
# multiple links
links = [link['href'] for link in soup.select('a[href*="linkedin.com/company/"]')]
  • Related