Home > Software engineering >  Column value not properly passed to hive udf spark scala
Column value not properly passed to hive udf spark scala

Time:05-23

I have created a hive udf like below,

Class customUdf extends UDF{
def evaluate(col : String): String = {
return col   "abc"
}
}

I then registered the udf in sparksession by,

sparksession.sql("""CREATE TEMPORARY FUNCTION testUDF AS 'testpkg.customUdf'""");

When I try to query hive table using below query in scala code it does not progress and does not throw error also,

SELECT testUDF(value) FROM t;

However when I pass a string like below from scala code it works

SELECT testUDF('str1') FROM t;

I am running the queries via sparksession.Tried with GenericUdf, but still facing same issue. This happens only when i pass hive column. What could be reason.

CodePudding user response:

Try referencing your jar from hdfs:

create function testUDF as 'testpkg.customUdf' using jar 'hdfs:///jars/customUdf.jar';

CodePudding user response:

I am not sure about implementation of UDFs in Scala, but when I faced similar issue in Java, I noticed a difference that if you plug in literal

select udf("some literal value")

then it is received by UDF as a String. But when you select from a Hive table

select udf(some_column) from some_table

you may get what's called a LazyString for which you would need to use getObject to retrieve actual value. I am not sure is Scala handles these lazy values automatically.

  • Related