Home > Enterprise >  Python list data filtering
Python list data filtering

Time:12-02

I have a list that holds names of files, some of which are almost identical except for their timestamp string section. The list is in the format of [name-subname-timestamp] for example:

myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']

What I need is a list that holds for every name and subname, the most recent file derived by the timestamp. I have started by creating a list that holds every [name-subname]:

name_subname_list = []
for row in myList:
    name_subname_list.append((row.rpartition('-')[0]))
name_subname_list = set(name_subname_list) # {'name1-001', 'name2-002', 'name1-002'}

Not sure if it is the right approach, moreover I am not sure how to continue. Any ideas?

CodePudding user response:

This code is what you asked for:

For each name-subname, you will have the corresponding newest file:

from datetime import datetime as dt
dic = {}
for i in myList:
    sp = i.split('-')
    name_subname = sp[0] '-' sp[1]
    mytime = sp[2].split('.')[0]
    if name_subname not in dic:
        dic[name_subname] = mytime 
    else:
        if dt.strptime(mytime, "%Y%m%d%H%M") > dt.strptime(dic[name_subname], "%Y%m%d%H%M"):
            dic[name_subname] = mytime

result = []           
for name_subname in dic:
    result.append(name_subname '-' dic[name_subname] '.txt')

which out puts resutl to be like:

['name1-001-202112021010.txt',
 'name1-002-202112021010.txt',
 'name2-002-202112020811.txt']

CodePudding user response:

Try this:

myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
dic = {}
for name in myList:
    parts = name.split('-')
    dic.setdefault(parts[0]   '-'   parts[1], []).append(parts[2])

unique_list = []
for key,value in dic.items():
    unique_list.append(key   '-'   max(value))
  • Related