Home > Software engineering >  Sorting a list of chromsomes in the correct order
Sorting a list of chromsomes in the correct order

Time:06-29

A seemingly simple problem, but one that's proving a bit vexing. I have a list of chromosomes (there are 23 chromosome - chromosomes 1 to 21, then chromosome X and chromosome Y) like so:

['chr11','chr14','chr16','chr13','chr4','chr13','chr2','chr1','chr2','chr3','chr14','chrX',]

I would like to sort this in the following order :

['chr1', 'chr2','chr2','chr3','chr4','chr11','chr13','chr13', 'chr14','chr14','chr16','chrX']

However, due to the lexicographical nature of python's sort it will sort chr1, chr10, chr11, chr12...chr2, etc. as I have chromosome X, sorting by their integer values also doesn't seem like an option. would I potentially have to specify a unique key by which to sort the list? Or is there some sort of obvious solution I'm missing

As always, any help is much appreciated!

CodePudding user response:

You can use natsorted, what you want is natural sorting after all ;)

l = ['chr11','chr14','chr16','chr13','chr4','chr13','chr2',
     'chr1','chr2','chr3','chr14','chrX','chrY']

from natsort import natsorted

out = natsorted(l)

output:

['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13',
 'chr13', 'chr14', 'chr14', 'chr16', 'chrX', 'chrY']

CodePudding user response:

You can create a custom key:

key={s:i for i,s in 
    enumerate([f'chr{x}' for x in list(range(1,22)) ['X','Y']],1)}

>>> key
{'chr1': 1, 'chr2': 2, 'chr3': 3, 'chr4': 4, 'chr5': 5, 'chr6': 6, 'chr7': 7, 'chr8': 8, 'chr9': 9, 'chr10': 10, 'chr11': 11, 'chr12': 12, 'chr13': 13, 'chr14': 14, 'chr15': 15, 'chr16': 16, 'chr17': 17, 'chr18': 18, 'chr19': 19, 'chr20': 20, 'chr21': 21, 'chrX': 22, 'chrY': 23}

Then use that key as a lookup in sorted:

li = ['chr11','chr14','chr16','chr13','chr4','chr13','chr2',
     'chr1','chr2','chr3','chr14','chrX','chrY']

>>> sorted(li, key=lambda s: key[s])
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13', 'chr13', 'chr14', 'chr14', 'chr16', 'chrX', 'chrY']

CodePudding user response:

natsort as already mentioned by @mozway is the fastest way.

Here the solution without using external libraries.

sorted(l, key=lambda x: int(val) if (val:=x[3:]).isnumeric() else ord(val))

It gives the same output.

CodePudding user response:

You could try to replace X and Y for 22 and 23 respectly inside a lambda function that then replace the char values for nothing and then use only the int part of the string to sort the list

l = ['chr1', 'chr2','chr2','chr3','chr4','chr11','chr13','chr13', 'chr14','chr14','chr16','chrX']

sorted( l, key= lambda x: int(x.replace('X','22').replace('Y','23').replace('chr','')))

# OUTPUT
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13', 'chr13', 'chr14', 'chr14', 'chr16','chrX']

CodePudding user response:

you can use this human sorting like this :

    import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split(r'(\d )', text) ]

alist=['chr11','chr14','chr16','chr13','chr4','chr13','chr2','chr1','chr2','chr3','chr14','chrX',]

alist.sort(key=natural_keys)
print(alist)

output:
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13', 'chr13', 'chr14', 'chr14', 'chr16', 'chrX', 'chrY']


or you can using natstor(github) natstor(lib) like this:

   import natsort 
   list=['chr11','chr14','chr16','chr13','chr4','chr13','chr2',
     'chr1','chr2','chr3','chr14','chrX','chrY']
    result=natsort.natsorted(list)
    print(result)
    

output:
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13', 'chr13', 'chr14', 'chr14', 'chr16', 'chrX', 'chrY']

  • Related