Sort Words in a Dictionary by The Amount Of Times They Appear-CodePudding

String = "Today was a very a good day. Tomorrow might be a better day" I want to sort the each word in a dictionary by the amount of time they have appeared. So the output will be:

 {3: ['a'], 2: ['day'], 1: ['Today', 'was', 'very', 'good', 'Tomorrow', 'might', 'be', 
 'better']}

I am really unsure about how to approach this problem. The code I have so far:

String = "Today was a very a good day. Tomorrow might be a better day"
word = String.split()
frequency = {}
count = 0

for i in word:
    count =  1

CodePudding user response：

The collections has defaultdict and Counter, made specifically for problems like this -

Counter let's you fetch a count for each time a token uniquely occurred in the data.
Defaultdict let's you create a dictionary with lists as values where you can restructure (and remove duplicates) the output of the Counter in the form you need.

from collections import defaultdict, Counter

String = "Today was a very a good day. Tomorrow might be a better day"
tokens = String.replace('.','').split()  #remove the fullstop

d = defaultdict(list)

for k,v in Counter(tokens).items():
    if k not in d[v]:  #if condition to ensure only unique tokens added
        d[v].append(k)
    
output = dict(d)
print(output)

{1: ['Today', 'was', 'very', 'good', 'Tomorrow', 'might', 'be', 'better'],
 3: ['a'],
 2: ['day']}

You can also get the unique tokens and their counts using numpy.unique instead of collections.Counter, but that would be a roundabout way of doing this.

np.unique(tokens, return_counts=True)

CodePudding user response：

As an addition to Akshay answear, putting in ascending order:

output = dict(sorted(d.items(), key=lambda item: item[0],reverse=False))

Output:

{
1: ['Today', 'was', 'very', 'good', 'Tomorrow', 'might', 'be', 'better'], 
2: ['day'], 
3: ['a']
}

If you want descending order, just reverse=True. And here it is how you order by word : count

String = "Today was a very a good day. Tomorrow might be a better day"
word = String.split()
frequency = {}
count = 0

for i in word:
    if i not in frequency:
        frequency[i] = 1
    else:
        frequency[i]  = 1

frequency = dict(sorted(frequency.items(), key=lambda item: item[1],reverse=True))

Output:

{
'a': 3, 
'Today': 1, 
'was': 1, 
'very': 1, 
'good': 1, 
'day.': 1, 
'Tomorrow': 1, 
'might': 1, 
'be': 1, 
'better': 1, 
'day': 1
}

CodePudding user response：

You can also iterate over the set of the words in the text and use the setdefault method

text_list = "Today was a very a good day. Tomorrow might be a better day".replace('.','').split()
d = {}
for w in set(text_list):
    d.setdefault(text_list.count(w), []).append(w)
print(d)