Home > database >  A more elegant way to cluster sorted characters together?
A more elegant way to cluster sorted characters together?

Time:10-23

Suppose, I want to sort characters in a string by their frequency.

def frequencySort(self, s: str) -> str:
    
    freq = {}
    
    for i in s:
        if i in freq:
            freq[i]  = 1
        else:
            freq[i] = 1
    
    print(freq)
    
    sorted_chars = sorted(s, key = lambda x : -freq[x])
    
    return "".join(sorted_chars)

However, for inputs where some characters have equal frequency, these characters end up intermingled together in the output.

E.g., for input loveleetcode the code above outputs eeeelolovtcd whereas it should output eeeeoollvtcd or eeeelloovtcd (o's and l's should be clustered together, not intermixed).

So, I add a tie breaker that uses a fraction of the character's ASCII code (end's up being less than one, so guaranteed not to overpower next most frequent char):

def frequencySort(self, s: str) -> str:
    
    freq = {}
    
    for i in s:
        if i in freq:
            freq[i]  = 1
        else:
            freq[i] = 1
    
    # Tie-breaker:
    
    for i in freq:
        freq[i]  = ord(i) / 1000
    
    sorted_chars = sorted(s, key = lambda x : -freq[x])
    
    return "".join(sorted_chars)

This works fine, but I was wondering if there is a more elegant (pythonic) way to enforce clustering of similar characters during the sort.

CodePudding user response:

Change the sort key to a tuple that includes the character itself.

def frequencySort(self, s: str) -> str:
    freq = {}
    for i in s:
        if i in freq:
            freq[i]  = 1
        else:
            freq[i] = 1
    
    sorted_chars = sorted(s, key=lambda x: (-freq[x], x))
    return "".join(sorted_chars)

Now, when sorted() encounters two different characters with the same count, it'll use the next element of the tuple to sort them. Since the next element is the character itself, the same characters will be sorted next to each other.

>>> solution.frequencySort("loveleetcode")
'eeeelloocdtv'

Unrelated note: you can use collections.Counter instead of creating a dict and counting the characters.

freq = collections.Counter(s)

instead of

freq = {}
for i in s:
    if i in freq:
        freq[i]  = 1
    else:
        freq[i] = 1

CodePudding user response:

You can use a collections.Counter and its most_common method that returns tuples of (char, count) sorted by count:

from collections import Counter

def frequencySort(s: str) -> str:
    counts = Counter(s)
    return ''.join([char * count for char, count in counts.most_common()])

print(frequencySort("loveleetcode"))
# 'eeeelloovtcd'
  • Related