Home > Blockchain >  Scala spark dataframe map sorting as per key
Scala spark dataframe map sorting as per key

Time:01-20

import spark.implicits._

import org.apache.spark.sql.column

def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))

val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)),  ("dog",Map("cream"->6,"black"->5,"white"->2)))

  .toDF("animal","ageMap")

testDF.show(false)

val testDF1 = testDF.withColumn("keySort",map_from_entries(array_sort(map_entries(col("ageMap")))))

This code runs fine in spark >3 . I want to run spark<3 .

CodePudding user response:

Welcome to Stackoverflow!

From your comment I gather that your code was working in v3.2.2 and not in v2.4.5.

Your issue is that map_entries does not exist in Spark v2.4.5. You can get the same functionality by extracting the keys and values separately using map_keys and map_values, and then using array_zip to combine them.

The first bit is exactly the same:

import spark.implicits._
import org.apache.spark.sql.Column

def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))
val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)), ("dog",Map("cream"->6,"black"->5,"white"->2))).toDF("animal","ageMap")

testDF.show(false)
 ------ ------------------------------------ 
|animal|ageMap                              |
 ------ ------------------------------------ 
|cat   |[black -> 3, brown -> 5, white -> 1]|
|dog   |[cream -> 6, black -> 5, white -> 2]|
 ------ ------------------------------------ 

And the difference is in how you define testDF1

val testDF1 = testDF
  .withColumn("keys", map_keys(col("ageMap")))
  .withColumn("values", map_values(col("ageMap")))
  .withColumn("keySort", map_from_entries(array_sort(arrays_zip(col("keys"), col("values")))))
  .select("animal", "ageMap", "keySort")

testDF1.show(false)
 ------ ------------------------------------ ------------------------------------ 
|animal|ageMap                              |keySort                             |
 ------ ------------------------------------ ------------------------------------ 
|cat   |[black -> 3, brown -> 5, white -> 1]|[black -> 3, brown -> 5, white -> 1]|
|dog   |[cream -> 6, black -> 5, white -> 2]|[black -> 5, cream -> 6, white -> 2]|
 ------ ------------------------------------ ------------------------------------ 

This code ran successfully on a v2.4.5 spark-shell.

  • Related