Home > Enterprise >  How to sort by value in spark Scala
How to sort by value in spark Scala

Time:09-17

I have a key, value pair and I need to return the top 10 elements by value in descending order. As you can see from my actual output below, it's giving me the top values by the key instead of the value (in this case by ascii character code).

For example:

//Input:
(the, 5),
(is, 10),
(me, 1)

//Expected Output:
(is, 10),
(the, 5),
(me, 1)

//Actual Output:
(the, 5),
(me, 1),
(is, 10)

My function:


def getActiveTaxis(taxiLines: RDD[Array[String]]): Array[(String, Int)] = {
    // Removing set up code for brevity

    val counts = keys.map(x => (x, 1))

    val sortedResult = counts.reduceByKey((a, b) => a   b).sortBy(_._2, false)

    sortedResult.top(10)
}

CodePudding user response:

You should use take() function instead of top().

take() will return top N elements whereas top() will return top N elements after sorting the RDD based on the specified implicit Ordering[T].

You can refer the implementation of top() here.

  • Related