Read CSV files to Pandas dataframe from FTP with Python ftplib-CodePudding

I am getting a "[Errno 2] No such file or directory" error when trying to download files from an FTP server to a Pandas dataframe. The files are in the root directory of the FTP.

I am guessing that the pd.read_csv() function is looking at my local file system... i.e. at the local path were the script resides.... But I do not understand how to change this.

def fetch_files(site, username, password, directory: str = '/', filematch: str = '*.csv'):
    with ftplib.FTP(site) as ftp:
        # pass the url without protocol
        ftp = ftplib.FTP(site)
        # pass credentials if anonymous access is not allowed
        ftp.login(username, password)
        ftp.cwd(directory)
        list_ = []
        for file_ in ftp.nlst(filematch):
            print(file_) # This works
            df = pd.read_csv(file_, index_col=None, header=0) # This fails
            list_.append(df)

Or would I have to use the ftp.retrlines() method? If so what is the difference between the LIST and MLSD parameter?

On a side note: The files in the CSVs have HTML code in them like & which screws out the SQL bulk insert. That's the reason I am reading them to a dataframe is to change the encoding and merge the individual files. Is there a faster way to do this directly via the Python csv module? I guess this would be faster?

Thank you in advance

CodePudding user response：

Use FTP.retrbinary and BytesIO to download the file to memory and then pass the in-memory file-like object to read_csv:

flo = BytesIO()
ftp.retrbinary('RETR '   file_, flo.write)
flo.seek(0)
pd.read_csv(flo, ...)

The above loads whole CSV file to a memory and only then it will parse it. If you want to prse the file as it downloads, it that would probably require implementing a smart custom file-like object. What is not easy.

For a question that does something similar, see my answer to:
Get files names inside a zip file on FTP server without downloading whole archive.