I have an RDD containing values like this:
[
(Key1, ([2,1,4,3,5],5)),
(Key2, ([6,4,3,5,2],5)),
(Key3, ([14,12,13,10,15],5)),
]
and I need to sort the value of the array part just like this:
[
(Key1, ([1,2,3,4,5],5)),
(Key2, ([2,3,4,5,6],5)),
(Key3, ([10,12,13,14,15],5)),
]
I find two sorting methods for Spark: sortBy
and sortbyKey
. I tried the sortBy
method like this:
myRDD.sortBy(lambda x: x[1][0])
But unfortunately, it sort data based on the first element of the array instead of sorting the elements of the array per se.
Also, the sortByKey
seems not to help cause it just sorts the data based on the keys.
How can I achieve the sorted RDD?
CodePudding user response:
Try something like this:
rdd2 = rdd.map(lambda x: (x[0], sorted(x[1]), x[2] ))