Home > OS >  I can't access the nested link using Beautifulsoup. My current code is returning an empty list
I can't access the nested link using Beautifulsoup. My current code is returning an empty list

Time:06-10

Using Beautifulsoup I'm writing a script which will download all the images, but my current implementation isn't returning anything.

Link: https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/

import requests
from bs4 import BeautifulSoup


r = requests.get('https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/')
soup = BeautifulSoup(r.content, 'lxml')


pictureslist = soup.find_all('div', attrs={'id':'gallery-1','class':'gallery galleryid-268780 gallery-columns-3 gallery-size-medium'})

print(pictureslist)

When I run the code it returns an empty list. I've been at it for an hour and I'm not too sure where I'm going wrong?

CodePudding user response:

It's because your request fails with 503 http error code by the server. Many websites will block bots/scripts.

Add a header to your request and this particular website will accept it.

import requests
from bs4 import BeautifulSoup

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
r = requests.get('https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/', headers=headers)
soup = BeautifulSoup(r.content, 'lxml')

pictureslist = soup.find_all('div', attrs={'id': 'gallery-1',
                                           'class': 'gallery galleryid-268780 gallery-columns-3 gallery-size-medium'})

print(pictureslist)

In the future to root cause issues with requests- work your way down. Set a break point after your get request to ensure it is in fact returning the website's data (200 http code). Don't just assume it is returning what you think it is.

  • Related