Using Beautifulsoup I'm writing a script which will download all the images, but my current implementation isn't returning anything.
Link: https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/')
soup = BeautifulSoup(r.content, 'lxml')
pictureslist = soup.find_all('div', attrs={'id':'gallery-1','class':'gallery galleryid-268780 gallery-columns-3 gallery-size-medium'})
print(pictureslist)
When I run the code it returns an empty list. I've been at it for an hour and I'm not too sure where I'm going wrong?
CodePudding user response:
It's because your request fails with 503 http error code by the server. Many websites will block bots/scripts.
Add a header to your request and this particular website will accept it.
import requests
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
r = requests.get('https://www.f1-fansite.com/f1-wallpaper/wallpaper-photos-monaco-f1-gp/', headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
pictureslist = soup.find_all('div', attrs={'id': 'gallery-1',
'class': 'gallery galleryid-268780 gallery-columns-3 gallery-size-medium'})
print(pictureslist)
In the future to root cause issues with requests- work your way down. Set a break point after your get request to ensure it is in fact returning the website's data (200 http code). Don't just assume it is returning what you think it is.