Home > Net >  Extract specific links after extracting from BeautifulSoup
Extract specific links after extracting from BeautifulSoup

Time:09-28

I had previously extract some information in the webpage using BeautifulSoup4: https://www.peakbagger.com/list.aspx?lid=5651

And I got a list of a href:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.peakbagger.com/list.aspx?lid=5651'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

a= soup.select("a:nth-of-type(1)")
a

But I would only wants the one with the links starting on 'peak.aspx?pid=10...'

How do I only print out the ones with 'peak.aspx?pid=10...', do I need to use a loop or split it?

Thanks.

CodePudding user response:

An approach could be to loop over your selection and just pick the links that contain the string peak.aspx?pid=:

[x['href'] for x in soup.select('a') if 'peak.aspx?pid=' in str(x)]

But you can also specify your selector to get the result - This will give you only the second column from the table and its a tags:

soup.select('table.gray  tr td:nth-of-type(2) a')

To get the links you have to loop over the result:

[x['href'] for x in soup.select('table.gray  tr td:nth-of-type(2) a')]
  • Related