Home > Software design >  How to sort a list by numerical value at specific point in name
How to sort a list by numerical value at specific point in name

Time:12-13

I have a list:

lst = ['45MO_221115_High4_d-1.tif', '45MO_221115_High4_d-2.tif', '45MO_221115_High4_d0.tif', '45MO_221115_High4_d1.tif', '45MO_221115_High4_d3.tif',
       '45MO_221115_Low1_d-1.tif', '45MO_221115_Low1_d-2.tif', '45MO_221115_Low1_d0.tif', '45MO_221115_Low1_d1.tif', '45MO_221115_Low1_d3.tif',
       '45MO_221115_Med2_d-1.tif', '45MO_221115_Med2_d-2.tif', '45MO_221115_Med2_d0.tif', '45MO_221115_Med2_d1.tif', '45MO_221115_Med2_d3.tif']

How can I sort the list so that the day -2 images (d-2) appear before the day -1 (d-1)? All the d-2 can appear at the beginning of each set so that:

lst_s = ['45MO_221115_High4_d-2.tif', '45MO_221115_High4_d-1.tif', '45MO_221115_High4_d0.tif', '45MO_221115_High4_d1.tif', '45MO_221115_High4_d3.tif',
         '45MO_221115_Low1_d-2.tif', '45MO_221115_Low1_d-1.tif', '45MO_221115_Low1_d0.tif', '45MO_221115_Low1_d1.tif', '45MO_221115_Low1_d3.tif',
         '45MO_221115_Med2_d-2.tif', '45MO_221115_Med2_d-1.tif', '45MO_221115_Med2_d0.tif', '45MO_221115_Med2_d1.tif', '45MO_221115_Med2_d3.tif']

or all the d-2 can be grouped at the beginning:

lst_s = ['45MO_221115_High4_d-2.tif', '45MO_221115_Low1_d-2.tif', '45MO_221115_Med2_d-2.tif',
         '45MO_221115_High4_d-1.tif',  '45MO_221115_Low1_d-1.tif', ...]

Both variants are fine. Second one is probably easier.

CodePudding user response:

One approach to match the second output (first d-2, then d-1, then the rest) is to do:

res = sorted(lst, key=lambda s: ('d-2' not in s, 'd-1' not in s))
print(res)

Output

['45MO_221115_High4_d-2.tif',
 '45MO_221115_Low1_d-2.tif',
 '45MO_221115_Med2_d-2.tif',
 '45MO_221115_High4_d-1.tif',
 '45MO_221115_Low1_d-1.tif',
 '45MO_221115_Med2_d-1.tif',
 '45MO_221115_High4_d0.tif',
 '45MO_221115_High4_d1.tif',
 '45MO_221115_High4_d3.tif',
 '45MO_221115_Low1_d0.tif',
 '45MO_221115_Low1_d1.tif',
 '45MO_221115_Low1_d3.tif',
 '45MO_221115_Med2_d0.tif',
 '45MO_221115_Med2_d1.tif',
 '45MO_221115_Med2_d3.tif']

CodePudding user response:

Use natsort directly

from natsort import realsorted, ns

sort = realsorted(your_list)
print(sort)

Gives #

  ['45MO_221115_High4_d-2.tif',
 '45MO_221115_High4_d-1.tif',
 '45MO_221115_High4_d0.tif',
 '45MO_221115_High4_d1.tif', 
 '45MO_221115_High4_d3.tif',
 '45MO_221115_Low1_d-2.tif',
 '45MO_221115_Low1_d-1.tif',
 '45MO_221115_Low1_d0.tif',
 '45MO_221115_Low1_d1.tif',
 '45MO_221115_Low1_d3.tif',
 '45MO_221115_Med2_d-2.tif',
 '45MO_221115_Med2_d-1.tif',
 '45MO_221115_Med2_d0.tif',
 '45MO_221115_Med2_d1.tif',
 '45MO_221115_Med2_d3.tif']

CodePudding user response:

The sorted function takes a key argumentdocumentation. The value you pass to this argument can be a function that takes one argument - the element of the list being sorted - and return a "key" to use to sort the item instead of the item itself.

We can define this function to take the file name and return the relevant digits as an integer, using regular expressions to capture the relevant portion of the filename. Explanation of regex

import re

def file_to_key(filename):
    digits = re.findall(r"d(-?\d )\.tif", filename)[0]
    return int(digits)

files = ['45MO_221115_High4_d-1.tif', '45MO_221115_High4_d-2.tif', '45MO_221115_High4_d0.tif', '45MO_221115_High4_d1.tif', '45MO_221115_High4_d3.tif', '45MO_221115_Low1_d-1.tif', '45MO_221115_Low1_d-2.tif', '45MO_221115_Low1_d0.tif', '45MO_221115_Low1_d1.tif', '45MO_221115_Low1_d3.tif', '45MO_221115_Med2_d-1.tif', '45MO_221115_Med2_d-2.tif', '45MO_221115_Med2_d0.tif', '45MO_221115_Med2_d1.tif', '45MO_221115_Med2_d3.tif']

files_s = sorted(files, key=file_to_key)
print(files_s)

which gives the desired order:

['45MO_221115_High4_d-2.tif', '45MO_221115_Low1_d-2.tif', '45MO_221115_Med2_d-2.tif',
 '45MO_221115_High4_d-1.tif', '45MO_221115_Low1_d-1.tif', '45MO_221115_Med2_d-1.tif',
 '45MO_221115_High4_d0.tif', '45MO_221115_Low1_d0.tif', '45MO_221115_Med2_d0.tif',
 '45MO_221115_High4_d1.tif', '45MO_221115_Low1_d1.tif', '45MO_221115_Med2_d1.tif',
 '45MO_221115_High4_d3.tif', '45MO_221115_Low1_d3.tif', '45MO_221115_Med2_d3.tif']

To get the order you show in your first output option, we need to sort by part of the filename before the integer we found, and then the integer from in the previous snippet. Doing that is easy -- have the key function return those values in that order, in a tuple:

def file_to_key(filename):
    search = re.findall(r"(.*d)(-?\d )\.tif", filename)
    file_str, digits = search[0]
    return (file_str, int(digits))

files_s = sorted(files, key=file_to_key)
print(files_s)

Regex explanation

Which gives:

['45MO_221115_High4_d-2.tif', '45MO_221115_High4_d-1.tif', '45MO_221115_High4_d0.tif', '45MO_221115_High4_d1.tif', '45MO_221115_High4_d3.tif',
 '45MO_221115_Low1_d-2.tif', '45MO_221115_Low1_d-1.tif', '45MO_221115_Low1_d0.tif', '45MO_221115_Low1_d1.tif', '45MO_221115_Low1_d3.tif',
 '45MO_221115_Med2_d-2.tif', '45MO_221115_Med2_d-1.tif', '45MO_221115_Med2_d0.tif', '45MO_221115_Med2_d1.tif', '45MO_221115_Med2_d3.tif']
  • Related