Home > Back-end >  Delete redundant filenames in python based on filename_xyz-year-mm-dd
Delete redundant filenames in python based on filename_xyz-year-mm-dd

Time:02-10

I have a lot of files (logs) with the same name but different dates following the name - Filename structure is as followed: filename_xyz_year-mm-dd

I want to delete the older files but i'm struggling to split the filename structure between the string (name) and values (date ran last). Is there a way to read and ignore similar file names and split the date off? Then sort and delete the older logs dates, while keeping the 5 most recent?

overflow_job_2022-01-10
overflow_job_2022-01-12
overflow_job_2022-01-15
overflow_job_2022-01-19
overflow_job_2022-02-01
overflow_job_2022-02-05
overflow_job_2022-02-08

I currently have all the files listed with listdir, how to seperate/sort/then delete? This would run regularly, thinking to throw the filename in an array and sort?

Thanks in advance.

CodePudding user response:

Sorting that list will put them all in date order, no splitting necessary. You CAN do name.split('-',1). The second parameter says to stop after one split.

CodePudding user response:

Provided that the leading part, (name), of the filenames is the same, you could do a simple (sorted), because dates in the iso format, (YYYY-MM-DD) will sort lexicographically (and they all have the same leading name).

for file in sorted(files)[:-5]:
    print(file)

Prints:

overflow_job-2022-01-10
overflow_job-2022-01-12

UPDATE: If there are files with different leading names (as per jenkinsmcgee's comment). Use a dictionary to list all files with the same leading names.

files = """overflow_job-2022-01-10
overflow_job-2022-01-12
overflow_job-2022-01-15
overflow_job-2022-01-19
overflow_job-2022-02-01
overflow_job-2022-02-05
overflow_job-2022-02-08""".splitlines()

d = dict()

for item in files:
    name, _ = item.split('-',1)
    d.setdefault(name, []).append(item)

for array in d.values():
    for file in sorted(array)[:-5]:
        print('to delete:', file)

UPDATE2: as per your remark I made a mistake in my initial post: the correct format is filename_xyx_2022-02-05, update the code to:

for item in files:
    name, _ = item.rsplit('_',1)
    d.setdefault(name, []).append(item)
  • Related