I can't access the nested link using Beautifulsoup. My current code is returning an empty list-CodePudding

Using Beautifulsoup I'm writing a script which will download all the images, but my current implementation isn't returning anything.

Link: https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/

import requests
from bs4 import BeautifulSoup


r = requests.get('https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/')
soup = BeautifulSoup(r.content, 'lxml')


pictureslist = soup.find_all('div', attrs={'id':'gallery-1','class':'gallery galleryid-268780 gallery-columns-3 gallery-size-medium'})

print(pictureslist)

When I run the code it returns an empty list. I've been at it for an hour and I'm not too sure where I'm going wrong?

CodePudding user response：

It's because your request fails with 503 http error code by the server. Many websites will block bots/scripts.

Add a header to your request and this particular website will accept it.

import requests
from bs4 import BeautifulSoup

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
r = requests.get('https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/', headers=headers)
soup = BeautifulSoup(r.content, 'lxml')

pictureslist = soup.find_all('div', attrs={'id': 'gallery-1',
                                           'class': 'gallery galleryid-268780 gallery-columns-3 gallery-size-medium'})

print(pictureslist)

In the future to root cause issues with requests- work your way down. Set a break point after your get request to ensure it is in fact returning the website's data (200 http code). Don't just assume it is returning what you think it is.