REDUCER CODE This code finds the frequency of the words from a text file, and I would like to know how to change this to find the longest words in the text file and print them out eg. "The longest word has 13 characters. The result includes: "
import sys
results = {}
for line in sys.stdin:
word, frequency = line.strip().split('\t', 1)
results[word]=results.get(word,0) int(frequency)
words = list(results.keys())
words.sort()
for word in words:
Print(word,results[word])
MAPPER CODE
import sys
for line in sys.stdin:
for word in line.strip().split():
print (word , "1")
CodePudding user response:
To build on my suggestion (loop through words, keep longest in variable):
longest = ""
for line in something:
for word in line.lower().split():
if len(word.strip()) > len(longest):
longest = word.strip()
print("Longest word is:", longest, "with the length of:", len(longest))
CodePudding user response:
If you don't want to keep all words then you could do something like this:
longest = set()
max_length = 0
for line in sys.stdin:
for word in line.strip().split():
length = len(word)
if length > max_length:
max_length = length
longest = {word}
elif length == max_length:
longest.add(word)
print(longest)
If you want to keep them, grouped by length, you could use a defaultdict
:
from collections import defaultdict
words_length = defaultdict(set)
for line in sys.stdin:
for word in line.strip().split():
words_length[len(word)].add(word)
print(words_length[max(words_length)])