Home > Software design >  Webscrape - Getting link/href
Webscrape - Getting link/href

Time:12-08

I am trying to get into a webpage and get the href/link for each row.

Currently, the code just prints blank.

Expected output is printing the href/link of each row from the webpage.

import requests
from bs4 import BeautifulSoup

url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&pageNumber=1&size=20'

baseurl='https://ash.confex.com/ash/2021/webprogram/'

res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')


productlist = soup.find_all('div',class_='session-card')

for b in productlist:
    links = b["href"]
    print(links)


CodePudding user response:

What happens?

First at all take a closer look to your soup, you wont find the information your searching for, cause you will be blocked.

Also elements in your selection find_all('div',class_='session-card') have no direct attribut href.

How to fix?

Add some headers to your request:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
res = requests.get(url, headers=headers)

Select additionally the <a> in your iteration to select the links and get the href:

b.a["href"]

Example

import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&pageNumber=1&size=20'

baseurl='https://ash.confex.com/ash/2021/webprogram/'

res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.content,'html.parser')

for b in soup.find_all('div',class_='session-card'):
    links = b.a["href"]
    print(links)
  • Related