Home > database >  how to use a custom function in query on Spark Dataframe using Scala
how to use a custom function in query on Spark Dataframe using Scala

Time:09-22

I load data from database to Spark Dataframe,named DF,then I must to extract some records from the Dataframe which their ID has special condition. So, I define this function:

def hash_id(id:String): Int = {
val two_char = id.takeRight(2).toInt
val hash_result = two_char % 4
return hash_result
 }

Then, I use the function in this query:

DF.filter(hash_id("ID")===3)

But I receive this error:

value === is not a member of Int 

DF has ID column.

Would you please guide me how to use a custom function in where/filter clause?

Any help would be really appreciated.

CodePudding user response:

=== can only be used between Column objects. That's why you have an error value === is not a member of Int, as return type of your function hash_id is an Int, not a Column

To be able to use your function, you should convert it to an user-defined function and apply this function to a column object as follow:

import org.apache.spark.sql.functions.{col, udf}

def hash_id(id:String): Int = {
  val two_char = id.takeRight(2).toInt
  val hash_result = two_char % 4
  return hash_result
}

val hash_id_udf = udf((id: String) => hasd_id(id))

DF.filter(hash_id_udf(col("ID")) === 3)

  • Related