Home > Mobile >  extracting images from links in excel file
extracting images from links in excel file

Time:06-27

i have to extract some images from an excel file where links are provided with .csv extension. I have tried my best but I don't know how to do it. I need some python API for that. Here is one link of the data.

2,1.97E-05,35767,46274998,5/17/20,2020,1913087,https://www.inaturalist.org/observations/46274998,1,0,41.94650813,-72.83742948,Hartford,Connecticut,United States,Alliaria petiolata,4,Flowering,,c = digital manipulation

CodePudding user response:

enter image description here

and here is one of the link

2,1.97E-05,35767,46274998,5/17/20,2020,1913087,https://www.inaturalist.org/observations/46274998,1,0,41.94650813,-72.83742948,Hartford,Connecticut,United States,Alliaria petiolata,4,Flowering,,c = digital manipulation

CodePudding user response:

Here is the code

import urllib.request
import re
import os

#the directory to where save the images
DIRECTORY = "book"

#the url to fetch the html page where the images are
URL = "https://www.inaturalist.org/taxa/56061-Alliaria-petiolata/browse_photos"

#the regex to get the url to the images from the html page
REGEX = '(?<=<a href=")http://\d.bp.inaturalist.org/[^"] '



#the prefix of the image file name
PREFIX = 'page_'

if not os.path.isdir(DIRECTORY):
    os.mkdir(DIRECTORY)

contents = urllib.request.urlopen(URL).read().decode('utf-8')
links = re.findall(REGEX, contents)

print("Found {} lnks".format(len(links)))
print("Starting download...")

page_number = 1
total = len(links)
downloaded = 0
for link in links:
    filename = "{}/{}{}.jpg".format(DIRECTORY, PREFIX, page_number)
    if not os.path.isfile(filename):
        urllib.request.urlretrieve(link, filename)
        downloaded = downloaded   1
        print("done: {} ({}/{})".format(filename, downloaded, total))
    else:
        downloaded = downloaded   1
        print("skip: {} ({}/{})".format(filename, downloaded, total))
    page_number = page_number   1

print("Downloaded {} files".format(total))
  • Related