String = "Today was a very a good day. Tomorrow might be a better day" I want to sort the each word in a dictionary by the amount of time they have appeared. So the output will be:
{3: ['a'], 2: ['day'], 1: ['Today', 'was', 'very', 'good', 'Tomorrow', 'might', 'be',
'better']}
I am really unsure about how to approach this problem. The code I have so far:
String = "Today was a very a good day. Tomorrow might be a better day"
word = String.split()
frequency = {}
count = 0
for i in word:
count = 1
CodePudding user response:
The collections has defaultdict
and Counter
, made specifically for problems like this -
Counter
let's you fetch a count for each time a token uniquely occurred in the data.Defaultdict
let's you create a dictionary with lists as values where you can restructure (and remove duplicates) the output of theCounter
in the form you need.
from collections import defaultdict, Counter
String = "Today was a very a good day. Tomorrow might be a better day"
tokens = String.replace('.','').split() #remove the fullstop
d = defaultdict(list)
for k,v in Counter(tokens).items():
if k not in d[v]: #if condition to ensure only unique tokens added
d[v].append(k)
output = dict(d)
print(output)
{1: ['Today', 'was', 'very', 'good', 'Tomorrow', 'might', 'be', 'better'],
3: ['a'],
2: ['day']}
You can also get the unique tokens and their counts using numpy.unique instead of collections.Counter
, but that would be a roundabout way of doing this.
np.unique(tokens, return_counts=True)
CodePudding user response:
As an addition to Akshay answear, putting in ascending order:
output = dict(sorted(d.items(), key=lambda item: item[0],reverse=False))
Output:
{
1: ['Today', 'was', 'very', 'good', 'Tomorrow', 'might', 'be', 'better'],
2: ['day'],
3: ['a']
}
If you want descending order, just reverse=True
. And here it is how you order by word : count
String = "Today was a very a good day. Tomorrow might be a better day"
word = String.split()
frequency = {}
count = 0
for i in word:
if i not in frequency:
frequency[i] = 1
else:
frequency[i] = 1
frequency = dict(sorted(frequency.items(), key=lambda item: item[1],reverse=True))
Output:
{
'a': 3,
'Today': 1,
'was': 1,
'very': 1,
'good': 1,
'day.': 1,
'Tomorrow': 1,
'might': 1,
'be': 1,
'better': 1,
'day': 1
}
CodePudding user response:
You can also iterate over the set of the words in the text and use the setdefault method
text_list = "Today was a very a good day. Tomorrow might be a better day".replace('.','').split()
d = {}
for w in set(text_list):
d.setdefault(text_list.count(w), []).append(w)
print(d)