Home > Enterprise >  TypeError: <lambda>() missing 1 required positional argument: 'y'
TypeError: <lambda>() missing 1 required positional argument: 'y'

Time:02-24

I am running the below spark code in my jupyter and getting this error.

import re

def normalizewords(text):
    return re.compile(r'\W ',re.UNICODE).split(text.lower())

inputs = sc.textFile('Book.txt')
words = inputs.flatMap(normalizewords)
# wordscount = words.countByValue()
wordcount = words.map(lambda x :(x,1)).reduceByKey(lambda x,y : (x y))
sortedwords = wordcount.map(lambda x,y: (y,x)).sortByKey()
sortedwords.collect()

OutPut of WordCount will look like as shown below :

[('self', 111),
 ('employment', 75),
 ('building', 33),
 ('an', 178),
 ('internet', 26),
 ('business', 383),
 ('of', 970),
 ('one', 100)]

So first this I want to do is make is as below :

[(111,'self),
 (75,'employment')]

I have tried all possible ways of lambda x,y : y,x but not working any. if i put righten side (x,y) in bracket it give invalid syntax error.

CodePudding user response:

sortedwords = wordcount.sortByKey()

That's all you need, no extra lambda.

UPD. I think you can use this. However, why not use a DF?

sortedwords = wordcount.sortBy(lambda x: (x[1]), ascending=False).sortBy(lambda x: (x[0]), ascending=True).map(lambda x: (x[1], x[0]))
  • Related