I load data from database to Spark Dataframe,named DF,then I must to extract some records from the Dataframe which their ID has special condition. So, I define this function:
def hash_id(id:String): Int = {
val two_char = id.takeRight(2).toInt
val hash_result = two_char % 4
return hash_result
}
Then, I use the function in this query:
DF.filter(hash_id("ID")===3)
But I receive this error:
value === is not a member of Int
DF has ID column.
Would you please guide me how to use a custom function in where/filter
clause?
Any help would be really appreciated.
CodePudding user response:
===
can only be used between Column objects. That's why you have an error value === is not a member of Int
, as return type of your function hash_id
is an Int
, not a Column
To be able to use your function, you should convert it to an user-defined function and apply this function to a column object as follow:
import org.apache.spark.sql.functions.{col, udf}
def hash_id(id:String): Int = {
val two_char = id.takeRight(2).toInt
val hash_result = two_char % 4
return hash_result
}
val hash_id_udf = udf((id: String) => hasd_id(id))
DF.filter(hash_id_udf(col("ID")) === 3)