I am following this web scrapping tutorial and I am getting an error.
My code is as follows:
import requests
URL = "http://books.toscrape.com/" # Replace this with the website's URL
getURL = requests.get(URL, headers={"User-Agent":"Mozilla/5.0"})
print(getURL.status_code)
from bs4 import BeautifulSoup
soup = BeautifulSoup(getURL.text, 'html.parser')
images = soup.find_all('img')
print(images)
imageSources=[]
for image in images:
imageSources.append(image.get("src"))
print(imageSources)
for image in imageSources:
webs=requests.get(image)
open("images/" image.split("/")[-1], "wb").write(webs.content)
Unfortunately, I am getting an error in the line webs=requests.get(image)
, which is as follows:
MissingSchema: Invalid URL 'media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg': No schema supplied. Perhaps you meant http://media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg?
I am totally new to scrapping and I don't know what this means. Any suggestion is appreciated.
CodePudding user response:
You need to supply a proper URL
in this line:
webs=requests.get(image)
Because this media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg
is not a valid URL. Hence, the MissingSchema error
.
For example:
full_image_url = f"http://books.toscrape.com/{image}"
This gives you:
http://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg
Full code:
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get("http://books.toscrape.com/").text, 'html.parser')
images = soup.find_all('img')
imageSources = []
for image in images:
imageSources.append(image.get("src"))
for image in imageSources:
full_image_url = f"http://books.toscrape.com/{image}"
webs = requests.get(full_image_url)
open(image.split("/")[-1], "wb").write(webs.content)