I would like to download all the txt files from a directory that is located an FTP server. The directory has a very complex structure, that consists in many sub-directories. Each subdirectory has a md5sum.txt
file.
How can I write in script that recursively download the md5sum.txt
from each sub-directory? To be more precise :
<> the directory is :
http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/
<> a sub-directory with a txt file :
http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/Alasoo_2018/exon/
CodePudding user response:
You should be able to use Python's os.walk
to traverse this ftp server, although you may not be able to do it as you would a normal local directory. This GitHub repo implements os.walk
for FTP servers. By what I understand, you can create a new object with the path of your FTP server, but you may have to modify Walk
function to check if the file name is md5sum.txt
, and if it is, then download it. You could use something like the python requests
library to download the file.
CodePudding user response:
Start here: Downloading a directory tree with ftplib
Just add filter for your specific file name before the file download:
if file == "md5sum.txt":
with open(os.path.join(destination,file),"wb") as f:
ftp.retrbinary("RETR " file, f.write)