I have this code that outputs a list of values:
ARDD.map(function_B) \
.filter(lambda x: x is not None) \
.take(6)
Output:
['2','10','2','12','3','3']
How can I change the code to get this output?
[2:2, 3:2, 10:1, 12:1]
CodePudding user response:
Use map
and reduceByKey
RDD methods:
rdd = spark.sparkContext.parallelize(['2', '10', '2', '12', '3', '3'])
rdd1 = rdd.map(lambda x: (x, 1)) \
.reduceByKey(lambda a, b: a b) \
.map(lambda x: f"{x[0]}:{x[1]}")
print(rdd1.collect())
#['10:1', '12:1', '3:2', '2:2']