Python: sort 2D list according to the properties of sublists-CodePudding

I have a 2D list:

ls = [
['-2,60233106656288100', '2', 'C'],
['-9,60233106656288100', '2', 'E'],
['-4,60233106656288100', '2', 'E'],
['-3,60233106656288100', '2', 'C'],
['-5,60233106656288100', '4', 'T'],
['-0,39019660724115224', '3', 'E'],
['-3,60233106656288100', '2', 'T'],
['-6,01086748514074000', '1', 'Q'],
['-5,02684650459461800', '0', 'X'],
['-1,25228509312138300', 'A', 'N'],
['-0,85517128843547330', '3', 'E'],
['1,837508975733196200', '3', '-', 'E'],
['1,850925075915637700', '5', '-', 'T'],
['1,826767133229081000', '4', '-', 'C'],
['1,845357865328532300', '3', '-', 'E'],
['0,636275318914609100', 'a', 'n', 'N']
]

I want to sort it first so that the shorter sublists are sorted according to the second column and after that according to the third column so that the list stays sorted according to the second column (first row has 0 in the second column, then 1, then five twos etc. but the twos switch places so that I first have two E's and then two C's and then T). After that I want to sort the longer sublists according to the fourth column. The row where I have A should be the last one of the shorter lists and the row where I have a should be the last row. So the output should be as follows:

[
['-5,02684650459461800', '0', 'X'],
['-6,01086748514074000', '1', 'Q'],
['-9,60233106656288100', '2', 'E'],
['-4,60233106656288100', '2', 'E'],
['-3,60233106656288100', '2', 'C'],
['-2,60233106656288100', '2', 'C'],
['-3,60233106656288100', '2', 'T'],
['-0,39019660724115224', '3', 'E'],
['-0,85517128843547330', '3', 'E'],
['-5,60233106656288100', '4', 'T'],
['-1,25228509312138300', 'A', 'N'],
['1,837508975733196200', '3', '-', 'E'],
['1,845357865328532300', '3', '-', 'E'],
['1,826767133229081000', '4', '-', 'C'],
['1,850925075915637700', '5', '-', 'T'],
['0,636275318914609100', 'a', 'n', 'N']
]

I know that I can sort according to the second column as:

ls.sort(key=lambda x:x[1])

But this sorts the whole list and gives:

['-5,02684650459461800', '0', 'X']
['-6,01086748514074000', '1', 'Q']
['-2,60233106656288100', '2', 'C']
['-9,60233106656288100', '2', 'E']
['-4,60233106656288100', '2', 'E']
['-3,60233106656288100', '2', 'C']
['-3,60233106656288100', '2', 'T']
['-0,39019660724115224', '3', 'E']
['-0,85517128843547330', '3', 'E']
['1,837508975733196200', '3', '-', 'E']
['1,845357865328532300', '3', '-', 'E']
['-5,60233106656288100', '4', 'T']
['1,826767133229081000', '4', '-', 'C']
['1,850925075915637700', '5', '-', 'T']
['-1,25228509312138300', 'A', 'N']
['0,636275318914609100', 'a', 'n', 'N']

How can I implement the sorting so that I can choose a certain portion of the list and then sort it and after that sort it again according to other column?

CodePudding user response：

How about:

ls.sort(key=lambda x: (l := len(x), x[1], '' if l < 4 else x[3]))

That would sort it by length of the sublist first, then by the 2nd column and finally by the 4th column, if there is one (picking '' in case there isn't, which would still sort it all the way to the top).

CodePudding user response：

If I understand you correctly, you want to sort the list

first by the len of the sublists,
then by each of the elements in the list, except for the first, using the next element as a tie-breaker in case the previous are all equal

For this, you can use a tuple as the search key, using the len and a slice of the sublist starting at the second element (i.e. at index 1):

ls.sort(key=lambda x: (len(x), x[1:]))

Note that this will also use elements after the fourth as further tie-breakers, which might not be wanted. Also this creates temporary (near) copies of all the sublists, which may be prohibitive if the lists are longer, even if all comparisons may be decided after the 3rd or 4th element.

Alternatively, if you only need the first four, or ten, or whatever number of elements, you can create a closed slice and used that to compare:

ls.sort(key=lambda x: (len(x), x[1:4]))

Since out-of-bounds slices are evaluated as empty lists, this works even if the lists have fewer elements than either the start- or end-index.