Home > Enterprise >  Problem Following Web Scraping Tutorial Using Python
Problem Following Web Scraping Tutorial Using Python

Time:01-15

I am following this web scrapping tutorial and I am getting an error.

My code is as follows:

import requests
URL = "http://books.toscrape.com/" # Replace this with the website's URL
getURL = requests.get(URL, headers={"User-Agent":"Mozilla/5.0"})
print(getURL.status_code)

from bs4 import BeautifulSoup

soup = BeautifulSoup(getURL.text, 'html.parser')

images = soup.find_all('img')
print(images)

imageSources=[]

for image in images:
  imageSources.append(image.get("src"))
print(imageSources)

for image in imageSources:
  webs=requests.get(image)
  open("images/" image.split("/")[-1], "wb").write(webs.content)

Unfortunately, I am getting an error in the line webs=requests.get(image), which is as follows:

MissingSchema: Invalid URL 'media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg': No schema supplied. Perhaps you meant http://media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg?

I am totally new to scrapping and I don't know what this means. Any suggestion is appreciated.

CodePudding user response:

You need to supply a proper URL in this line:

webs=requests.get(image)

Because this media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg is not a valid URL. Hence, the MissingSchema error.

For example:

full_image_url = f"http://books.toscrape.com/{image}"

This gives you:

http://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg

Full code:

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("http://books.toscrape.com/").text, 'html.parser')
images = soup.find_all('img')

imageSources = []
for image in images:
    imageSources.append(image.get("src"))

for image in imageSources:
    full_image_url = f"http://books.toscrape.com/{image}"
    webs = requests.get(full_image_url)
    open(image.split("/")[-1], "wb").write(webs.content)
  • Related