I am getting a "[Errno 2] No such file or directory" error when trying to download files from an FTP server to a Pandas dataframe. The files are in the root directory of the FTP.
I am guessing that the pd.read_csv()
function is looking at my local file system... i.e. at the local path were the script resides.... But I do not understand how to change this.
def fetch_files(site, username, password, directory: str = '/', filematch: str = '*.csv'):
with ftplib.FTP(site) as ftp:
# pass the url without protocol
ftp = ftplib.FTP(site)
# pass credentials if anonymous access is not allowed
ftp.login(username, password)
ftp.cwd(directory)
list_ = []
for file_ in ftp.nlst(filematch):
print(file_) # This works
df = pd.read_csv(file_, index_col=None, header=0) # This fails
list_.append(df)
Or would I have to use the ftp.retrlines()
method? If so what is the difference between the LIST
and MLSD
parameter?
On a side note: The files in the CSVs have HTML code in them like &
which screws out the SQL bulk insert. That's the reason I am reading them to a dataframe is to change the encoding and merge the individual files. Is there a faster way to do this directly via the Python csv module? I guess this would be faster?
Thank you in advance
CodePudding user response:
Use FTP.retrbinary
and BytesIO
to download the file to memory and then pass the in-memory file-like object to read_csv
:
flo = BytesIO()
ftp.retrbinary('RETR ' file_, flo.write)
flo.seek(0)
pd.read_csv(flo, ...)
Similar question: Reading files from FTP server to DataFrame in Python
The above loads whole CSV file to a memory and only then it will parse it. If you want to prse the file as it downloads, it that would probably require implementing a smart custom file-like object. What is not easy.
For a question that does something similar, see my answer to:
Get files names inside a zip file on FTP server without downloading whole archive.