Delete redundant filenames in python based on filename

I have a lot of files (logs) with the same name but different dates following the name - Filename structure is as followed: filename_xyz_year-mm-dd

I want to delete the older files but i'm struggling to split the filename structure between the string (name) and values (date ran last). Is there a way to read and ignore similar file names and split the date off? Then sort and delete the older logs dates, while keeping the 5 most recent?

overflow_job_2022-01-10
overflow_job_2022-01-12
overflow_job_2022-01-15
overflow_job_2022-01-19
overflow_job_2022-02-01
overflow_job_2022-02-05
overflow_job_2022-02-08

I currently have all the files listed with listdir, how to seperate/sort/then delete? This would run regularly, thinking to throw the filename in an array and sort?

Thanks in advance.

CodePudding user response：

Sorting that list will put them all in date order, no splitting necessary. You CAN do name.split('-',1). The second parameter says to stop after one split.

CodePudding user response：

Provided that the leading part, (name), of the filenames is the same, you could do a simple (sorted), because dates in the iso format, (YYYY-MM-DD) will sort lexicographically (and they all have the same leading name).

for file in sorted(files)[:-5]:
    print(file)

Prints:

overflow_job-2022-01-10
overflow_job-2022-01-12

UPDATE: If there are files with different leading names (as per jenkinsmcgee's comment). Use a dictionary to list all files with the same leading names.

files = """overflow_job-2022-01-10
overflow_job-2022-01-12
overflow_job-2022-01-15
overflow_job-2022-01-19
overflow_job-2022-02-01
overflow_job-2022-02-05
overflow_job-2022-02-08""".splitlines()

d = dict()

for item in files:
    name, _ = item.split('-',1)
    d.setdefault(name, []).append(item)

for array in d.values():
    for file in sorted(array)[:-5]:
        print('to delete:', file)

UPDATE2: as per your remark I made a mistake in my initial post: the correct format is filename_xyx_2022-02-05, update the code to:

for item in files:
    name, _ = item.rsplit('_',1)
    d.setdefault(name, []).append(item)