make grouping of list of objects on basis of path matching-CodePudding

hi I have this kind of data

data = [{'name': 'root/folder1/asd/file1.csv'},'last_modified': datetime.datetime(2022, 8, 15, 11, 45, 47),
        {'name': 'root/folder1/asd/file2.csv' , 'last_modified': datetime.datetime(2025, 8, 15, 11, 45, 47),}, 
        {'name': 'root/folder1/folder2/folder3/new.csv', 'last_modified': datetime.datetime(2022, 8, 15, 11, 45, 47)}, 
        {'name': 'root/folder1/folder2/folder3/new.csv','last_modified': datetime.datetime(2023, 8, 15, 11, 45, 47)}]

now I want to get file from this data which have same path until file name and have latest last_modified date. and count of files with same path as showed in below

Expected result :

data = [{'name': 'root/folder1/asd/file2.csv' , 'last_modified': datetime.datetime(2025, 8, 15, 11, 45, 47) , 'count':2},{'name': 'root/folder1/folder2/folder3/new.csv','last_modified': datetime.datetime(2023, 8, 15, 11, 45, 47), 'count':2}]

as root/folder1/bsd was same path for both files and last_modified of this file2 was latest one and count is 2 as total 2 files found with this duplication. and same in 2nd record root/folder1/folder2/folder3 same path and last_modified was latest and count is 2 as total 2 files found with this duplication. If there is no match then one 1 existing file will considered as latest.

Tried till now

sorted_files = sorted(data, key=lambda d: (d.get('last_modified', {})), reverse=True)

which is sorting files but not sure how to remove non latest ones and keep records which are only singles and add count to it

CodePudding user response：

Use a dictionary whose keys are the directory names. Then when updating the dictionary, you can check if the current last_modified is greater than the previous one for that directory.

data_dict = {}
for item in data:
    dir = os.path.dirname(item['name'])
    if dir not in data_dict:
        item['count'] = 1
        data_dict[dir] = item
    else:
        count = data_dic[dir]['count']   1
        if item['last_modified'] > data_dict[dir]['last_modified']:
            data_dict[dir] = item
        data_dic[dir]['count'] = count

result = list(data_dict.values())