Home > Enterprise >  How to sort a list per the occurrence of a part of each items (items are strings)?
How to sort a list per the occurrence of a part of each items (items are strings)?

Time:06-27

I want to sort the following word pool according to occurrence of their 3-letter suffix, from most frequent to least frequent:

wordPool = ['beat','neat','food','good','mood','wood','bike','like','mike']

Expected output:

['food','good','mood','wood','bike','like','mike','beat','neat']

For simplicity, only 4-letter-words are in the pool and the suffix is always 3-letter ones.

(Note: If the counts are the same, then order can be arbitrary.)

CodePudding user response:

You can use collections.Counter() to get the frequency of the suffixes, and then use sort() with a key parameter to sort by the generated frequencies:

from collections import Counter
suffix_counters = Counter(s[-3:] for s in wordPool)
wordPool.sort(key=lambda x: suffix_counters[x[-3:]], reverse=True)
print(wordPool)

This outputs:

['food', 'good', 'mood', 'wood', 'bike', 'like', 'mike', 'beat', 'neat']

CodePudding user response:

  • Group by suffix using a dict of lists;
  • Sort the groups by decreasing order of size;
  • Join all the groups into a list.
def sorted_by_suffix_frequency(wordpool, n=3):
    groups = {}
    for w in wordpool:
        groups.setdefault(w[-n:], []).append(w)
    return [w for g in sorted(groups.values(), key=len, reverse=True) for w in g]

wordpool = ['beat','neat','food','good','mood','wood','bike','like','mike']

sorted_wordpool = sorted_by_suffix_frequency(wordpool)

print(sorted_wordpool)
# ['food', 'good', 'mood', 'wood', 'bike', 'like', 'mike', 'beat', 'neat']
  • Related