this code is built to download images from links in a column called "link" in CSV file and replace it with the name in another column called "name" but the code stopped working when he is facing a non-English character, I want the code to work also with non-english character
here is the code
import urllib.request
import csv
import os
with open('booklogo.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
if row["link"] != '' and row["title"] != '':
name, ext = os.path.splitext(row['link'])
if ext == '':
ext = ".png"
title_filename = f"{row['title']}{ext}".replace('/', '-')
urllib.request.urlretrieve(row['link'], title_filename)
here is the error
OSError Input In [5], in <cell line: 5>() 13 ext = ".png" 14 title_filename = f"{row['title']}{ext}".replace('/', '-') ---> 15 urllib.request.urlretrieve(row['link'], title_filename) File ~\anaconda3\lib\urllib\request.py:249, in urlretrieve(url, filename, reporthook, data) 247 # Handle temporary file setup. 248 if filename: --> 249 tfp = open(filename, 'wb') 250 else: 251 tfp = tempfile.NamedTemporaryFile(delete=False) OSError: [Errno 22] Invalid argument: 'Albert ?eská republika.png
CodePudding user response:
I think you're correct (in your comment below) that it's probably the question mark.
You need to sanitize your filename. This is not included in Python's standard lib, so we'll draw on the most popular answer to the same issue/question, from Turn a string into a valid filename?.
You'll need to add this function to your file:
import unicodedata
import re
def slugify(value, allow_unicode=False):
"""
Taken from https://github.com/django/django/blob/master/django/utils/text.py
Convert to ASCII if 'allow_unicode' is False. Convert spaces or repeated
dashes to single dashes. Remove characters that aren't alphanumerics,
underscores, or hyphens. Convert to lowercase. Also strip leading and
trailing whitespace, dashes, and underscores.
"""
value = str(value)
if allow_unicode:
value = unicodedata.normalize('NFKC', value)
else:
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
value = re.sub(r'[^\w\s-]', '', value.lower())
return re.sub(r'[-\s] ', '-', value).strip('-_')
Then modify your existing code, like:
...
# Sanitize filename. Will get rid of periods too, so add ext after
title_filename = slugify(row['title'])
title_filename = ext
...