I am running the below spark code in my jupyter and getting this error.
import re
def normalizewords(text):
return re.compile(r'\W ',re.UNICODE).split(text.lower())
inputs = sc.textFile('Book.txt')
words = inputs.flatMap(normalizewords)
# wordscount = words.countByValue()
wordcount = words.map(lambda x :(x,1)).reduceByKey(lambda x,y : (x y))
sortedwords = wordcount.map(lambda x,y: (y,x)).sortByKey()
sortedwords.collect()
OutPut of WordCount will look like as shown below :
[('self', 111),
('employment', 75),
('building', 33),
('an', 178),
('internet', 26),
('business', 383),
('of', 970),
('one', 100)]
So first this I want to do is make is as below :
[(111,'self),
(75,'employment')]
I have tried all possible ways of lambda x,y : y,x
but not working any.
if i put righten side (x,y) in bracket it give invalid syntax error.
CodePudding user response:
sortedwords = wordcount.sortByKey()
That's all you need, no extra lambda.
UPD. I think you can use this. However, why not use a DF?
sortedwords = wordcount.sortBy(lambda x: (x[1]), ascending=False).sortBy(lambda x: (x[0]), ascending=True).map(lambda x: (x[1], x[0]))