I'm using this code to find the last csv file added but I'm not able to find the last 3 files added. I can eliminate the last file and then find the max again but I think it'd be too long. Can you please help me find a solution? All I need is to find the last 3 csv files added in a directory.
import pandas as pd
import csv
import os
import zipfile
t=[]
j_csvs="path2"
#Find all csv files directories and collect them within t
d = os.path.join(j_csvs)
for root,dirs,files in os.walk(d):
for file in files:
if file.endswith(".csv"):
p=os.path.abspath(os.path.join(root, file))
t.append(p)
else: "DoNothing"
latest_f_j = max(t, key=os.path.getctime)
df=pd.read_csv(latest_f_j)
df
CodePudding user response:
You cannot determine what the last 3 files added are with any degree of certainty.
At the upper level, a system may put those file in order of date, file type, size, name - both case sensitive and without.
With date order, you have no way of knowing since date stamps can be manipulated, as can pre-dated files moved into a directory and thus preserving its original date and time details.
If you are looking at files at a lower level, as that seen by the file system, then they are generally unordered. The o/s on its own whim, will store the details as it sees fit.
You have no way whatsoever in determining which of 3 files were the last to added. Well, you have one way, run a watch on the directory which will fire when a file is added and keep a circular list of 3 replacing the current before moving onto the next and then waiting for the next trigger to fire.
CodePudding user response:
Maybe you could use os.path.getmtime
:
import pathlib
import os
def last_n_modified_files_in_dir(path_to_dir: pathlib.Path, file_ext: str,
n: int) -> list[str]:
return [f.name for _, f in sorted([(os.path.getmtime(f), f) for f in
path_to_dir.glob(f'*.{file_ext}')])][-n:][::-1]
def main() -> None:
desktop_path = pathlib.Path('/Users/shash/Desktop')
print(last_n_modified_files_in_dir(desktop_path, 'csv', n=3))
if __name__ == '__main__':
main()
Output:
['apples.csv', 'bananas.csv', 'carrots.csv']
Use rglob
instead of glob
if you want to check subdirectories aswell.
CodePudding user response:
Use sorted
with a callback function to infer the ordering relationship, some possibilities:
- with
os.path.getctime
for system’s ctime (it is system dependent, see doc) - with
os.path.getmtime
for the time of last modification - with
os.path.getatime
for the time of last access.
Pass the reverse=True
parameter for a result in descending order and then slice.
import os.path
def last_newest_files(path, ref_ext='csv', amount=3):
# return files ordered by newest to oldest
def f_conditions(path):
# check by file and extension
_, ext = os.path.splitext(path) # ext start with ".", ie ".csv"
return os.path.isfile(path) and ext.lstrip('.') == ref_ext
# apply conditions
filtered_files = filter(f_conditions, (os.path.join(path, basename) for basename in os.listdir(path)))
# get the newest
return sorted(filtered_files, key=os.path.getctime, reverse=True)[:amount]
path_dir = '.'
ext = 'csv'
last_n_files = 3
print(*last_newest_files(path_dir, ext, last_n_files), sep='\n')