Home > other >  How to find the last three files added in a directory
How to find the last three files added in a directory

Time:07-18

I'm using this code to find the last csv file added but I'm not able to find the last 3 files added. I can eliminate the last file and then find the max again but I think it'd be too long. Can you please help me find a solution? All I need is to find the last 3 csv files added in a directory.

import pandas as pd
import csv
import os
import zipfile

t=[]

j_csvs="path2"

#Find all csv files directories and collect them within t
d = os.path.join(j_csvs)
for root,dirs,files in os.walk(d):
    for file in files:
        if file.endswith(".csv"):
            p=os.path.abspath(os.path.join(root, file))
            t.append(p)
        else: "DoNothing"

latest_f_j = max(t, key=os.path.getctime)      
df=pd.read_csv(latest_f_j)
df

CodePudding user response:

You cannot determine what the last 3 files added are with any degree of certainty.

At the upper level, a system may put those file in order of date, file type, size, name - both case sensitive and without.

With date order, you have no way of knowing since date stamps can be manipulated, as can pre-dated files moved into a directory and thus preserving its original date and time details.

If you are looking at files at a lower level, as that seen by the file system, then they are generally unordered. The o/s on its own whim, will store the details as it sees fit.

You have no way whatsoever in determining which of 3 files were the last to added. Well, you have one way, run a watch on the directory which will fire when a file is added and keep a circular list of 3 replacing the current before moving onto the next and then waiting for the next trigger to fire.

CodePudding user response:

Maybe you could use os.path.getmtime:

import pathlib
import os


def last_n_modified_files_in_dir(path_to_dir: pathlib.Path, file_ext: str,
                                 n: int) -> list[str]:
    return [f.name for _, f in sorted([(os.path.getmtime(f), f) for f in
                                       path_to_dir.glob(f'*.{file_ext}')])][-n:][::-1]


def main() -> None:
    desktop_path = pathlib.Path('/Users/shash/Desktop')
    print(last_n_modified_files_in_dir(desktop_path, 'csv', n=3))


if __name__ == '__main__':
    main()

Output:

['apples.csv', 'bananas.csv', 'carrots.csv']

Use rglob instead of glob if you want to check subdirectories aswell.

CodePudding user response:

Use sorted with a callback function to infer the ordering relationship, some possibilities:

  • with os.path.getctime for system’s ctime (it is system dependent, see doc)
  • with os.path.getmtime for the time of last modification
  • with os.path.getatime for the time of last access.

Pass the reverse=True parameter for a result in descending order and then slice.

import os.path

def last_newest_files(path, ref_ext='csv', amount=3):
    # return files ordered by newest to oldest

    def f_conditions(path):
        # check by file and extension
        _, ext = os.path.splitext(path) # ext start with ".", ie ".csv"
        return os.path.isfile(path) and ext.lstrip('.') == ref_ext

    # apply conditions
    filtered_files = filter(f_conditions, (os.path.join(path, basename) for basename in os.listdir(path)))

    # get the newest
    return sorted(filtered_files, key=os.path.getctime, reverse=True)[:amount]


path_dir = '.'
ext = 'csv'
last_n_files = 3

print(*last_newest_files(path_dir, ext, last_n_files), sep='\n')
  • Related