from bs4 import BeautifulSoup
import requests
with open("htmlviewer.html") as fp:
soup = BeautifulSoup(fp, "html.parser")
gp = soup.find_all("a")
for link in gp:
bs = link.get('href')
I am using this code to extract links from source code and here is my output -|
None
https://support.google.com/websearch/answer/181196?hl=en-IN
None
https://www.google.com/webhp?hl=en&ictx=2&sa=X&ved=0ahUKEwj88YTzkL_7AhX9TGwGHZQpBVEQPQgJ
https://chromedriver.chromium.org/
/search?rlz=1C1CHBD_enIN1032IN1032&sxsrf=ALiCzsZzV82nGh7PsFzltlGMqVaKe-JR2Q:1669028827453&q=What is a Chrome WebDriver?&sa=X&ved=2ahUKEwj88YTzkL_7AhX9TGwGHZQpBVEQzmd6BAgUEAUhttps://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/
https://splinter.readthedocs.io/en/latest/drivers/chrome.html
I want all the links in 1 single list or dictionary
if I do this
bs = {link.get('href')}
I am getting every single link in new dictionary can anyone help, I am new at coding,
Also how do I select links starting with https and ignore /search, I know very stupid questions but I am week into learning python.
CodePudding user response:
First create an empty list outside the for loop, e.g. links = []
and then inside your for loop do links.append(link.get("href"))