I was recently writing a program to scrape links from "https://news.ycombinator.com/" but I've tried many methods and whenever I request the link it returns None
.
import requests
from bs4 import BeautifulSoup
response = requests.get('https://news.ycombinator.com/')
soup = BeautifulSoup(response.text , 'html.parser')
links = soup.select('.titleline')
print(links[0].get('href'))
CodePudding user response:
That's because you got list of 'span' tags instead of 'a' tags.
You should change the line:
links = soup.select('.titleline > a')
CodePudding user response:
You have to select the <a>
to get the value of its attribute - Based on your selection from the question you only get the <span>
and this do not have an attribute href
so you have to reference the next <a>
:
links = soup.select('.titleline')
print(links[0].a.get('href'))
or even better select it directly :
links = soup.select('.titleline a') ### to signify that the second element is matched if it has an ancestor that matches the first element.
### or
links = soup.select('.titleline > a') ### to signify that the second element is matched if it has a parent that matches the first element
print(links[0].get('href'))