I'm trying to create a simple UDF that concatenates 2 strings and a separator.
def stringConcat(separator: str, first: str, second: str):
return first separator second
spark.udf.register("stringConcat_udf", stringConcat)
customerDf.select("firstname", "lastname", stringConcat_udf(lit("-"),"firstname",
"lastname")).show()
This is the traceback:
An exception was thrown from a UDF: 'TypeError: decoding str is not supported'. Full traceback
below:
TypeError: decoding str is not supported
What is wrong with this?
CodePudding user response:
For one thing, PySpark already has a function called concat_ws
(docs) which does just that:
from pyspark.sql import functions as fn
customerDf.select("firstname", "lastname", fn.concat_ws("-","firstname", "lastname").alias("joined")).show()
But if you still want to define this UDF, the spark.udf.register("stringConcat_udf", stringConcat)
isn't stored anywhere, which means it's usable in spark queries, but you'd need to define it to use with pyspark dataframes (docs):
from pyspark.sql import functions as fn
from pyspark.sql.types import StringType
stringConcat_udf = fn.udf(stringConcat, StringType())
customerDf.select("firstname", "lastname", stringConcat_udf(fn.lit("-"),"firstname", "lastname").alias("joined")).show()
CodePudding user response:
After registering your UDF, you can call it using expr
. Try this:
customerDf.select("firstname", "lastname", expr('stringConcat_udf("-", firstname, lastname)'))
This works:
from pyspark.sql import functions as F
customerDf = spark.createDataFrame([('Tom', 'Hanks')], ["firstname", "lastname"])
def stringConcat(separator: str, first: str, second: str):
return first separator second
spark.udf.register("stringConcat_udf", stringConcat)
df = customerDf.select("firstname", "lastname", F.expr('stringConcat_udf("-", firstname, lastname)'))
df.show()
# --------- -------- ----------------------------------------
# |firstname|lastname|stringConcat_udf(-, firstname, lastname)|
# --------- -------- ----------------------------------------
# | Tom| Hanks| Tom-Hanks|
# --------- -------- ----------------------------------------