Scraping images and changin its name with python-CodePudding

I have a project which scrapes images from Tumblr using Python. I want to download the images found on the links I get scraping.

This is the entire code:

import requests
from bs4 import BeautifulSoup
import shutil
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


for source, image_urls in data.items():
    for url in image_urls:
        if posts_scrape.status_code == 200:
            url.raw.decode_content = True
            with open(source,'wb') as f:
                shutil.copyfileobj(url.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

Following an answer on this post i changed the code and used request and shutil:

for source, image_urls in data.items():
    for url in image_urls:
        if posts_scrape.status_code == 200:
            url.raw.decode_content = True
            with open(source,'wb') as f:
                shutil.copyfileobj(url.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

And now I am getting this error:

Traceback (most recent call last):
  File "/home/user/folder/Information.py", line 28, in <module>
    url.raw.decode_content = True
AttributeError: 'str' object has no attribute 'raw'

CodePudding user response：

you have to make request again with the image URL. Then you can get the response in raw form and save the image

replace the code with this below -

for source, image_urls in data.items():
    for url in image_urls:
        # make request with image url 
        img_scrape = requests.get(url, stream=True)

        if img_scrape.status_code == 200:
            with open(source,'wb') as f:
                img_scrape.raw.decode_content = True
                
                # save the image raw format
                shutil.copyfileobj(r.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

Output -

Image sucessfully Downloaded:  pics-bae
Image sucessfully Downloaded:  pics-bae
Image sucessfully Downloaded:  laravel
Image sucessfully Downloaded:  huariqueje
Image sucessfully Downloaded:  sweetd3lights
Image sucessfully Downloaded:  shesinthegrove
Image sucessfully Downloaded:  careful-disorder
Image sucessfully Downloaded:  beifongkendo
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  bradsbackpack
Image sucessfully Downloaded:  pensamentsisomnis
Image sucessfully Downloaded:  frankfurtphoto
........

..........