PySpark UDF Returns [Ljava.lang.Object;@]-CodePudding

I have the following function

from pyspark.sql.functions import udf, struct
from pyspark.sql.types import StringType, ArrayType

def f(row):
    .
    .
    .
    <compute my_field>
    print(f'my_field: {my_field}; type(my_field): {type(my_field)}')
    return str(my_field), StringType()


f_udf = udf(f)
new_df = df.withColumn('new_field', udf(struct([df[column] for column in df.columns if column != 'reserved']))

Here's a sample of what gets printed out -

my_field: erfSSSWqd; type(my_field): <class 'str'>

and here is new_df

 -------------- ---------------------------- 
|field         |new_field                   |
 -------------- ---------------------------- 
|WERWERV511    |[Ljava.lang.Object;@280692a3|
|WEQMNHV381    |[Ljava.lang.Object;@3ee30d9c|
|FSLQCXV881    |[Ljava.lang.Object;@16cbf3a9|
|SDTEHLV980    |[Ljava.lang.Object;@54e6686 |
|SDFWERV321    |[Ljava.lang.Object;@72377b29|
 -------------- ----------------------------

But I would expect strings in the new_field column. It looks like the types are all right. In fact, I don't even need to wrap my_field with str(), but I did that just in case.

Does anybody know how to fix this?

CodePudding user response：

Instead of returning the tuple str(my_field), StringType() only return the value str(my_field).

Moreover, you may specify the return type of your UDF as the second parameter here

f_udf = udf(f,StringType())

Let me know if this works for you.