I have a list of paths, which I have simplified into similar but simpler strings here:
paths = ['apple10/banana2/carrot1', 'apple10/banana1/carrot2', 'apple2/banana1', 'apple2/banana2', 'apple1/banana1', 'apple1/banana2', 'apple10/banana1/carrot1']
These paths need sorting in the order of the numbers. Ths first number (apple) is the most important in the search, followed by the second.
One added complication which may be clear is some of the paths will have a 3rd directory the data are within while others do not.
The MWE of the path structure looks as below:
parent
|-----apple1
|------banana1
|----- data*
|------banana2
|----- data*
|-----apple2
|------banana1
|----- data*
|------banana2
|----- data*
|-----apple10
|------banana1
|-----carrot1
|-----data*
|-----carrot2
|-----data*
|------banana2
|----- carrot1
|-----data*
The desired output is:
paths = ['apple1/banana1', 'apple1/banana2', 'apple2/banana1', 'apple2/banana2', 'apple10/banana1/carrot1', 'apple10/banana1/carrot2','apple10/banana2/carrot1']
I'm struggling to work out how to do this. sort will not work especially as the numbers will go into double digits and 10 would come before 2.
I have seen another answer which works with single numbers in a list of strings. How to correctly sort a string with a number inside? I've failed to adapt this to my problem.
Any assistance would be greatly appreciated.
CodePudding user response:
Try with sorted
, supplying a custom key that uses re
to extract all numbers from the path:
import re
>>> sorted(paths, key=lambda x: list(map(int,re.findall("(\d )", x))))
['apple1/banana1',
'apple1/banana2',
'apple2/banana1',
'apple2/banana2',
'apple10/banana1/carrot1',
'apple10/banana1/carrot2',
'apple10/banana2/carrot1']
CodePudding user response:
Addition to @not_speshal's answer:
Based on the answer from the question, you have provided, if your first word in path is not necessarily "apple", you can do something like this:
import re
def atoi(text):
return int(text) if text.isdigit() else text
def word_and_num_as_tuple(text):
return tuple( atoi(c) for c in re.split(r'(\d )', text) )
def path_as_sortable_tuple(path, sep='/'):
return tuple( word_and_num_as_tuple(word_in_path) for word_in_path in path.split(sep) )
paths = [
'apple10/banana2/carrot1',
'apple10/banana1/carrot2',
'apple2/banana1',
'apple2/banana2',
'apple1/banana1',
'apple1/banana2',
'apple10/banana1/carrot1'
]
paths.sort(key=path_as_sortable_tuple)
print(paths)
# And, of course, as a lambda one-liner:
paths.sort( key= lambda path: tuple( tuple( int(char_seq) if char_seq.isdigit() else char_seq for char_seq in re.split(r'(\d )', subpath) ) for subpath in path.split('/') ) )
It does exactly what @MarcinCuprjak suggested, but automatically
CodePudding user response:
If you can represent your data as tuples instead of string, then things get easier:
paths = [('apple', 10, 'banana', 2, 'carrot', 1),
('apple', 10, 'banana', 1, 'carrot', 2),
('apple', 2, 'banana', 1),
('apple', 2, 'banana', 2),
('apple', 1, 'banana', 1),
('apple', 1, 'banana', 2),
('apple', 10, 'banana', 1, 'carrot', 1)
]
paths.sort(key=lambda item: (len(item), item))
print(paths)
the output is as you desire I think:
[('apple', 1, 'banana', 1), ('apple', 1, 'banana', 2), ('apple', 2, 'banana', 1), ('apple', 2, 'banana', 2), ('apple', 10, 'banana', 1, 'carrot', 1), ('apple', 10, 'banana', 1, 'carrot', 2), ('apple', 10, 'banana', 2, 'carrot', 1)]