Sort List into different lists-CodePudding

I have list with file_names in it. (About 800 file_names)

[Example] file_name = 23475048_43241u_43x_pos11_7.npz

I need to sort the file_names and add it to lists. The file_names get sorted with the "pos". In my example is that pos11. (there are different pos -> pos0, pos12...)

I tried firstly to get all different pos_numbers in a Dict:

path =[filename for filename in glob.glob(os.path.join(my_dir, '*.npz'))] 

posList = []

for file in path:
  file_name = Path(file).parts[-1][:-4].split("_")
  posList.append(file_name[3])

mylist =  list(dict.fromkeys(posList))
files_dict = {}
for pos in mylist:files_dict[pos] = []

Output:

{'pos0': [], 'pos10': [], 'pos11': [], 'pos12': [], 'pos1': [], 'pos2': [], 'pos3': [], 'pos4': [], 'pos5': [], 'pos6': [], 'pos7': [], 'pos8': [], 'pos9': []}

And now I want to fill the different lists. But now I'm stuck. I want to to iter again over the list with file_names and add them to right list.

CodePudding user response：

Not sure what your code is doing but you can use the below program which takes in list of file names and outputs a dictionary of sorted lists indexed by the pos which is what I think you are trying to do. (If not maybe edit your question to elaborate some more)

files = ['1_2_3_pos1_2.np', '2_3_1_pos2_2.npz']
files_dict = {}
for file in files:
    pos = file.split('_')[3]
    files_dict[pos] = files_dict.get(pos, [])   [file]

for k in files_dict.keys():
    files_dict[k].sort()

print(files_dict)

Edit: As @Stef suggested you can make it more effecient by using setdefault

files = ['1_2_3_pos1_2.np', '2_3_1_pos2_2.npz']
files_dict = {}
for file in files:
    pos = file.split('_')[3]
    files_dict.setdefault(pos, []).append(file)

for k in files_dict.keys():
    files_dict[k].sort()

print(files_dict)

CodePudding user response：

@ARandomDeveloper's answer clearly explains how to populate the dict by iterating through the list only once. I recommend to study their answer until you've understood it well.

This is a very common way to populate a dict. You will probably encounter this pattern again.

Because this operation of grouping into a dict is so common, module more_itertools offers a function map_reduce for exactly this purpose.

from more_itertools import map_reduce

posList = '''23475048_43241u_43x_pos11_7.npz
23475048_43241u_43x_pos1_7.npz
23475048_43241u_43x_pos10_7.npz
23475048_43241u_43x_pos8_7.npz
23475048_43241u_43x_pos22_7.npz
23475048_43241u_43x_pos2_7.npz'''.split("\n") # example list from uingtea's answer

d = map_reduce(posList, keyfunc=lambda f: f.split('_')[3])

print(d)
# defaultdict(None, {
#   'pos11': ['23475048_43241u_43x_pos11_7.npz'],
#   'pos1': ['23475048_43241u_43x_pos1_7.npz'],
#   'pos10': ['23475048_43241u_43x_pos10_7.npz'],
#   'pos8': ['23475048_43241u_43x_pos8_7.npz'],
#   'pos22': ['23475048_43241u_43x_pos22_7.npz'],
#   'pos2': ['23475048_43241u_43x_pos2_7.npz']
# })

Internally, map_reduce uses almost-exactly the same code as suggested in @ARandomDeveloper's answer, except with a defaultdict.

CodePudding user response：

you need to extract the digits after pos use regex (\d )_\d\.npz then use .sort() function

import re

posList = '''23475048_43241u_43x_pos11_7.npz
23475048_43241u_43x_pos1_7.npz
23475048_43241u_43x_pos10_7.npz
23475048_43241u_43x_pos8_7.npz
23475048_43241u_43x_pos22_7.npz
23475048_43241u_43x_pos2_7.npz'''.split("\n")


posList = sorted(posList, key=lambda x: int(re.search(r"(\d )_\d\.npz", x)[1]))
print(posList)

results

['23475048_43241u_43x_pos1_7.npz',
  '23475048_43241u_43x_pos2_7.npz',
  '23475048_43241u_43x_pos8_7.npz',
  '23475048_43241u_43x_pos10_7.npz',
  '23475048_43241u_43x_pos11_7.npz',
  '23475048_43241u_43x_pos22_7.npz'
]