I have a project which scrapes images from Tumblr using Python. I want to download the images found on the links I get scraping.
This is the entire code:
import requests
from bs4 import BeautifulSoup
import shutil
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")
articles = soup.find_all("article", class_="FtjPK")
data = {}
for article in articles:
try:
source = article.find("div", class_="vGkyT").text
for imgvar in article.find_all("img", alt="Image"):
data.setdefault(source, []).extend(
[
i.replace("500w", "").strip()
for i in imgvar["srcset"].split(",")
if "500w" in i
]
)
except AttributeError:
continue
for source, image_urls in data.items():
for url in image_urls:
if posts_scrape.status_code == 200:
url.raw.decode_content = True
with open(source,'wb') as f:
shutil.copyfileobj(url.raw, f)
print('Image sucessfully Downloaded: ',source)
else:
print('Image Couldn\'t be retrieved')
Following an answer on this post i changed the code and used request
and shutil
:
for source, image_urls in data.items():
for url in image_urls:
if posts_scrape.status_code == 200:
url.raw.decode_content = True
with open(source,'wb') as f:
shutil.copyfileobj(url.raw, f)
print('Image sucessfully Downloaded: ',source)
else:
print('Image Couldn\'t be retrieved')
And now I am getting this error:
Traceback (most recent call last):
File "/home/user/folder/Information.py", line 28, in <module>
url.raw.decode_content = True
AttributeError: 'str' object has no attribute 'raw'
CodePudding user response:
you have to make request
again with the image URL. Then you can get the response in raw form and save the image
replace the code with this below -
for source, image_urls in data.items():
for url in image_urls:
# make request with image url
img_scrape = requests.get(url, stream=True)
if img_scrape.status_code == 200:
with open(source,'wb') as f:
img_scrape.raw.decode_content = True
# save the image raw format
shutil.copyfileobj(r.raw, f)
print('Image sucessfully Downloaded: ',source)
else:
print('Image Couldn\'t be retrieved')
Output -
Image sucessfully Downloaded: pics-bae
Image sucessfully Downloaded: pics-bae
Image sucessfully Downloaded: laravel
Image sucessfully Downloaded: huariqueje
Image sucessfully Downloaded: sweetd3lights
Image sucessfully Downloaded: shesinthegrove
Image sucessfully Downloaded: careful-disorder
Image sucessfully Downloaded: beifongkendo
Image sucessfully Downloaded: traveltoslovenia
Image sucessfully Downloaded: traveltoslovenia
Image sucessfully Downloaded: traveltoslovenia
Image sucessfully Downloaded: bradsbackpack
Image sucessfully Downloaded: pensamentsisomnis
Image sucessfully Downloaded: frankfurtphoto
........
..........