Downloading images from Instagram with wget-CodePudding

I'm trying to download images from Instagram, and the code is :

keywords =['cat','dog']
hashtags = ['cute_cat','cute_dog']

for keyword,tag in zip (keywords,hashtags):
    
    driver.get("https://www.instagram.com/explore/tags/"   tag   "/")

    n_scrolls = 10
    time.sleep(5)

    for j in range(0, n_scrolls):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        images = driver.find_elements_by_tag_name('img')
        images = [image.get_attribute('src') for image in images]
        images = images[:-3] 

       
        path=os.getcwd()
        path=os.path.join(path)

        for image in images:
            save_as = os.path.join( keyword   '.jpg')
            wget.download(image, save_as)

the problem is that wget isn't working right, or I'm doing something wrong, but I can't figure it out,

ValueError: not enough values to unpack (expected 2, got 1)

I already defined the url and the destination values in (image, save_as), but it keeps giving me this error. Can someone help me, please?

the full error message

ValueError Traceback (most recent call last)

 21 for image in images:
 22     save_as = os.path.join( keyword   '.jpg')

---> 23 wget.download(image, save_as)

524 else:
525     binurl = url

--> 526 (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback) 527 filename = detect_filename(url, out, headers) 528 if outdir:

224 """
225 Retrieve a URL into a temporary location on disk.
226 

237 data file as well as the resulting HTTPMessage object.
238 """
239 url_type, path = _splittype(url)

--> 241 with contextlib.closing(urlopen(url, data)) as fp: 242 headers = fp.info() 244 # Just return the local path and the "headers" for file://

-> 1656 mediatype, data = data.split(",",1) 1658 # even base64 encoded data URLs might be quoted so unquote in any case: 1659 data = unquote_to_bytes(data)

ValueError: not enough values to unpack (expected 2, got 1) , this is the full message

CodePudding user response：

I've isolated a url on instagram and run your code:

import os
import wget

image_url = "https://scontent-lcy1-2.cdninstagram.com/v/t51.2885-15/328075461_1175323806446003_923403735361226857_n.jpg?stp=dst-jpg_e35&_nc_ht=scontent-lcy1-2.cdninstagram.com&_nc_cat=111&_nc_ohc=O2DMK-Da8K8AX--kBZ0&edm=AGyKU4gBAAAA&ccb=7-5&ig_cache_key=MzAyODczNTQ0NjIwNTAzNjIzMQ==.2-ccb7-5&oh=00_AfCp4UuaO7KC2RlR1W-qdqgYh-7QyXaqlPMlGPgeYy_bMQ&oe=63E02A10&_nc_sid=4cb768"
keyword = "test_keyword"

       
path=os.getcwd()
path=os.path.join(path)


save_as = os.path.join( keyword   '.jpg')
wget.download(image_url, save_as)

And this successfully downloads an image from Instagram.

I would recommend you debug your code that parses the page and check the format of the url that it is generating. I suspect the url that you're retrieving from the page is not the correct format - you may be picking the wrong node to get the image url.

If not obviously incorrect, then in addition try taking that url that is generated by your code (output it to standard output), and try running it directly on the file system with wget natively - I suspect it'll give you the same error, and allow you to debug it further.