hi I have this kind of data
data = [{'name': 'root/folder1/asd/file1.csv'},'last_modified': datetime.datetime(2022, 8, 15, 11, 45, 47),
{'name': 'root/folder1/asd/file2.csv' , 'last_modified': datetime.datetime(2025, 8, 15, 11, 45, 47),},
{'name': 'root/folder1/folder2/folder3/new.csv', 'last_modified': datetime.datetime(2022, 8, 15, 11, 45, 47)},
{'name': 'root/folder1/folder2/folder3/new.csv','last_modified': datetime.datetime(2023, 8, 15, 11, 45, 47)}]
now I want to get file from this data which have same path until file name and have latest last_modified date. and count of files with same path as showed in below
Expected result :
data = [{'name': 'root/folder1/asd/file2.csv' , 'last_modified': datetime.datetime(2025, 8, 15, 11, 45, 47) , 'count':2},{'name': 'root/folder1/folder2/folder3/new.csv','last_modified': datetime.datetime(2023, 8, 15, 11, 45, 47), 'count':2}]
as root/folder1/bsd
was same path for both files and last_modified of this file2 was latest one and count is 2 as total 2 files found with this duplication. and same in 2nd record root/folder1/folder2/folder3
same path and last_modified was latest and count is 2 as total 2 files found with this duplication. If there is no match then one 1 existing file will considered as latest.
Tried till now
sorted_files = sorted(data, key=lambda d: (d.get('last_modified', {})), reverse=True)
which is sorting files but not sure how to remove non latest ones and keep records which are only singles and add count to it
CodePudding user response:
Use a dictionary whose keys are the directory names. Then when updating the dictionary, you can check if the current last_modified
is greater than the previous one for that directory.
data_dict = {}
for item in data:
dir = os.path.dirname(item['name'])
if dir not in data_dict:
item['count'] = 1
data_dict[dir] = item
else:
count = data_dic[dir]['count'] 1
if item['last_modified'] > data_dict[dir]['last_modified']:
data_dict[dir] = item
data_dic[dir]['count'] = count
result = list(data_dict.values())