The code below is giving me nearly the output i want but not quite.
def reducer(self, year, words):
x = Counter(words)
most_common = x.most_common(3)
sorted(x, key=x.get, reverse=True)
yield (year, most_common)
This is giving me output
"2020" [["coronavirus",4],["economy",2],["china",2]]
What I would like it to give me is
"2020" "coronavirus china economy"
If someone could explain to me why i am getting a list of lists instead of the output i require I would be most grateful. Along with an idea on how to improve the code to get what I need.
CodePudding user response:
From the documentation for Counter.most_common
explains why you get a list of lists.
most_common(n=None) method of collections.Counter instance
List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
Because sorting from highest to lowest frequency is like sorting in descending order, but sorting alphabetically is in ascending order, you can use a custom tuple where you take the negative of the frequency and sort everything in ascending order.
from collections import Counter
words = Counter(['coronavirus'] * 4 ['economy'] * 2 ['china'] * 2 ['whatever'])
x = Counter(words)
most_common = x.most_common(3)
# After sorting you need to discard the freqency from each (word, freq) tuple
result = ' '.join(word for word, _ in sorted(most_common, key=lambda x: (-x[1], x[0])))