Home > database >  Python3 dictionary: remove duplicate values in alphabetical order
Python3 dictionary: remove duplicate values in alphabetical order

Time:11-17

Let's say I have the following dictionary:

full_dic = {
  'aa': 1,
  'ac': 1,
  'ab': 1,
  'ba': 2,
  ...
}

I normally use standard dictionary comprehension to remove dupes like:

t = {val : key for (key, val) in full_dic.items()}
cleaned_dic = {val : key for (key, val) in t.items()}

Calling print(cleaned_dic) outputs {'ab': 1,'ba': 2, ...}

With this code, the key that remains seems to always be the final one in the list, but I'm not sure that's even guaranteed as dictionaries are unordered. Instead, I'd like to find a way to ensure that the key I keep is the first alphabetically.

So, regardless of the 'order' the dictionary is in, I want the output to be:

>> {'aa': 1,'ba': 2, ...}

Where 'aa' comes first alphabetically.


I ran some timer tests on 3 answers below and got the following (dictionary was created with random key/value pairs):

dict length:  10
# of loops:  100000
HoliSimo (OrderedDict): 0.0000098405 seconds
Ricardo: 0.0000115448 seconds
Mark (itertools.groupby): 0.0000111745 seconds

dict length:  1000000
# of loops:  10
HoliSimo (OrderedDict): 6.1724137300 seconds
Ricardo: 3.3102091300 seconds
Mark (itertools.groupby): 6.1338266200 seconds

We can see that for smaller dictionary sizes using OrderedDict is fastest but for large dictionary sizes it's slightly better to use Ricardo's answer below.

CodePudding user response:

t = {val : key for (key, val) in dict(sorted(full_dic.items(), key=lambda x: x[0].lower(), reverse=True)).items()}
cleaned_dic = {val : key for (key, val) in t.items()}
dict(sorted(cleaned_dic.items(), key=lambda x: x[0].lower()))
>>> {'aa': 1, 'ba': 2}

CodePudding user response:

You should use the OrderectDict class.

import collections
full_dic = {
  'aa': 1,
  'ac': 1,
  'ab': 1
}
od = collections.OrderedDict(sorted(d.items()))

In this way you will be sure to have sorted dictionary (Original code: StackOverflow).

And then:

result = {}
for k, vin od.items():
if value not in result.values():
        result[key] = value

CodePudding user response:

Seems like you can do this with a single sort and itertools.groupby. First sort the items by value, then key. Pass this to groupby and take the first item of each group to pass to the dict constructor:

from itertools import groupby

full_dic = {
  'aa': 1,
  'ac': 1,
  'xx': 2,
  'ab': 1,
  'ba': 2,
}
groups = groupby(sorted(full_dic.items(), key=lambda p: (p[1], p[0])), key=lambda x: x[1])
dict(next(g) for k, g in groups)
# {'aa': 1, 'ba': 2}
  • Related