from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
#import re
req = Request("https://www.indiegogo.com/individuals/23489031")
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
links = []
for link in soup.findAll('a'):
links.append(link.get('href'))
print(links)
This code works if I use only one url but does not work with multiple urls. how do i do the same if i want to do it with multiple urls?
CodePudding user response:
I haven't used bs4 ever, but you may be able to just create a list containing all the URLs you want to check. Then you can use a loop to iterate and work over each URL seperatly. Like:
urls = ["https://","https://","http://"] #But with actual links
for link in urls:
#Work with each link seperatly here
pass
CodePudding user response:
Here I leave you a small code that I had to do at some point of scraping
you can adapt it to what you want to achieve .. I hope it helps you
import requests
from bs4 import BeautifulSoup as bs
url_list=['https://www.example1.com' , 'https://www.example2.com' ]
def getlinks(url) :
r=requests.get(url)
tags_list=[ a for a in bs(r.text,'html.parser').find_all('a')]
links=[ f'{url.split("//")[0]}//{url.split("//")[1]}{link}' if link.split('/')[0]!='https:' else link for link in [href.attrs['href'] if 'href' in href.attrs else '' for href in tags_list ] ]
return links
you can loop through url_list and execute getlinks(url) with it