Home > Back-end >  download text files from a directory having a complex structure of sub-directories
download text files from a directory having a complex structure of sub-directories

Time:05-29

I would like to download all the txt files from a directory that is located a ftp server. The directory has a very complex structure, that consists in many sub-directories. Each subdirectory has a md5sum.txt file.

How can I write in script that recursively download the md5sum.txt from each sub-directory? To be more precise :

<> the directory is : http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/

<> a sub-directory with a txt file :

http://ftp.ebi.ac.uk/pub/databases/spot/eQTL/sumstats/Alasoo_2018/exon/

CodePudding user response:

You should be able to use Python's os.walk to traverse this ftp server, although you may not be able to do it as you would a normal local directory. This GitHub repo implements os.walk for FTP servers. By what I understand, you can create a new object with the path of your FTP server, but you may have to modify Walk function to check if the file name is md5sum.txt, and if it is, then download it. You could use something like the python requests library to download the file.

CodePudding user response:

Start here: Downloading a directory tree with ftplib

Just add filter for your specific file name before the file download:

if file == "md5sum.txt":
    with open(os.path.join(destination,file),"wb") as f:
        ftp.retrbinary("RETR " file, f.write)
  • Related