I have a script, below, that can download files from a particular row from 1 only CSV file. I have no problem with it, it works well and all files are downloaded into my 'Python Project' folder, root.
But I would like to add functions here, First, download not only 1 but multiple (20 or more) CSV files then I don't have to change the name manually here - open('name1.csv') everytime my script has done the job. Second request, downloads need to be placed in a folder with the same name of the csv file that downloads come from. Hopefully I'm clear enough :)
Then I could have:
- name1.csv -> name1 folder -> download from name1 csv
- name2.csv -> name2 folder -> download from name2 csv
- name3.csv -> name3 folder -> download from name3 csv
- ...
Any help or suggestions will be more than appreciate :) Many thanks!
from collections import Counter
import urllib.request
import csv
import os
with open('name1.csv') as csvfile: #need to add multiple .csv files here.
reader = csv.DictReader(csvfile)
title_counts = Counter()
for row in reader:
name, ext = os.path.splitext(row['link'])
title = row['title']
title_counts[title] = 1
title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
urllib.request.urlretrieve(row['link'], title_filename)
CodePudding user response:
You need to add an outer loop which will iterate over files in specific folder. You can use either os.listdir()
which returns list of all entries or glob.iglob()
with *.csv
pattern to get only files with .csv
extension.
Also there are some minor improvements you can make in your code. You're using Counter
in the way that it can be replaced with defaultdict
or even simple dict
. Also urllib.request.urlretrieve()
is a part of legacy interface which might get deprecated, so you can replace it with combination of urllib.request.urlopen()
and shutil.copyfileobj()
.
Finally, to create a folder you can use os.mkdir()
but previously you need to check whether folder already exists using os.path.isdir()
, it's required to prevent FileExistsError
exception.
Full code:
from os import mkdir
from os.path import join, splitext, isdir
from glob import iglob
from csv import DictReader
from collections import defaultdict
from urllib.request import urlopen
from shutil import copyfileobj
csv_folder = r"/some/path"
glob_pattern = "*.csv"
for file in iglob(join(csv_folder, glob_pattern)):
with open(file) as csv_file:
reader = DictReader(csv_file)
save_folder, _ = splitext(file)
if not isdir(save_folder):
mkdir(save_folder)
title_counter = defaultdict(int)
for row in reader:
url = row["link"]
title = row["title"]
title_counter[title] = 1
_, ext = splitext(url)
save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}")
with urlopen(url) as req, open(save_filename, "wb") as save_file:
copyfileobj(req, save_file)
You can help my country, check my profile info.
CodePudding user response:
For 1: Just loop over a list containing the names of your desired files. The list can be retrieved using "os.listdir(path)" which returns a list of the files contained inside your "path" (a folder containing the csv files in your case).
CodePudding user response:
If I understood well, you can mix os.makedirs
and os.path.join
:
>>> import os
>>> basepath = '/tmp'
>>> title_name = 'name1.ext'
>>> os.makedirs(os.path.join(basepath, title_name), exist_ok=True)
>>> filepath = os.path.join(basepath, title_name, title_name)
>>> print(filepath)
/tmp/name1.ext/name1.ext