Home > Blockchain >  Sort dict by multiple keys and with int and None data in python
Sort dict by multiple keys and with int and None data in python

Time:02-11

so I have a dict :

marker_dict[experiment_id] = {'SM0012AQ': { 'chromosome' : 1, 'linkageGroup':8, 'positionCm' : 45,'qualityStatus' : 'A' },
 'SM0075AQ': { 'chromosome' : 1, 'linkageGroup': 7, 'positionCm' : ,'qualityStatus' : 'A' }, 
'SM0078BQ': { 'chromosome' : 3, 'linkageGroup': 78, 'positionCm' : 7,'qualityStatus' : 'B' },
'SM0079PQ': { 'chromosome' : 4, 'linkageGroup': , 'positionCm' : 80,'qualityStatus' : 'B' },
'SM0080BQ': { 'chromosome' : , 'linkageGroup': 78, 'positionCm' : 447,'qualityStatus' : 'T' }}

I want to sort my dict on 'i.chromosome', 'i.linkageGroup' and 'i.positionCm'

What I am doing :

marker_list = sorted(marker_dict[experiment_id].values(), key=lambda i: (i.chromosome, i.linkageGroup, i.positionCm))

And i got :

TypeError: '<' not supported between instances of 'Nonetype' and 'int'

In python 2, they were doing :

markerList=sorted(markerDic.values(), key=operator.attrgetter('chromosome','linkageGroup','positionCm'))

Could you help me please, I am lost !

CodePudding user response:

Given

t = {'SM0012AQ': { 'chromosome' : 1, 'linkageGroup':8, 'positionCm' : 45,'qualityStatus' : 'A' },
    'SM0075AQ': { 'chromosome' : 1, 'linkageGroup': 7, 'positionCm' : None ,'qualityStatus' : 'A' },
    'SM0078BQ': { 'chromosome' : 3, 'linkageGroup': 78, 'positionCm' : 7,'qualityStatus' : 'B' },
    'SM0079PQ': { 'chromosome' : 4, 'linkageGroup': None, 'positionCm' : 80,'qualityStatus' : 'B' },
    'SM0080BQ': { 'chromosome' : None, 'linkageGroup': 78, 'positionCm' : 447,'qualityStatus' : 'T' }}

and the requirement that Nones are for int, you can solve it using

dict(sorted(t.items(), key=lambda i: (i[1]['chromosome'] if i[1]['chromosome'] is not None else -1, i[1]['linkageGroup'] if i[1]['linkageGroup'] is not None else -1 , i[1]['positionCm'] if i[1]['positionCm'] is not None else -1)))

where I've made the assumption that the numbers you have are 0 or positive. If you can also have negatives you need to change the -1 to a very large negative number.

The result:

{'SM0080BQ': {'chromosome': None,
  'linkageGroup': 78,
  'positionCm': 447,
  'qualityStatus': 'T'},
 'SM0075AQ': {'chromosome': 1,
  'linkageGroup': 7,
  'positionCm': None,
  'qualityStatus': 'A'},
 'SM0012AQ': {'chromosome': 1,
  'linkageGroup': 8,
  'positionCm': 45,
  'qualityStatus': 'A'},
 'SM0078BQ': {'chromosome': 3,
  'linkageGroup': 78,
  'positionCm': 7,
  'qualityStatus': 'B'},
 'SM0079PQ': {'chromosome': 4,
  'linkageGroup': None,
  'positionCm': 80,
  'qualityStatus': 'B'}}

CodePudding user response:

If all your values are greater than zero, you could use your original approach with or 0 to convert None values to zeroes and get then first in the sort order:

sorted(marker_dict[experiment_id].values(), 
       key=lambda i: (i.chromosome or 0, i.linkageGroup or 0, i.positionCm or 0))

Other wise you could write a small function to handle None values by converting the numbers to a tuple that prefixes the value with True/False depending on it not being None.

def None1st(*numbers): return [(n is not None,n or 0) for n in numbers]

sorted(marker_dict[experiment_id].values(), 
       key=lambda i: None1st(i.chromosome, i.linkageGroup, i.positionCm))
  • Related