This might be a niche question, but imagine that you have a udf
defined like this:
import pyspark.sql.functions as sf
import pyspark.sql.types as st
@sf.udf(returnType=st.ArrayType(st.StringType()))
def some_function(text: str) -> List[str]:
return text.split(' ')
This returns an udf
, of which I need to know it's returnType
. Is there a way to get the return type:
- Without calling the udf on a pyspark.sql.DataFrame and using the dtypes() function on the result
- Without storing the
returnType
for this function in a separate place
Context:
I want to give an .alias
to the pyspark.sql.column.Column
that is returned by the udf
, but the alias should depend upon its type.
So in dummy code the desired result would be:
input_column_name = 'some_text_column'
expr = some_udf_function(sf.col(input_column_name))
dtype_abbreviation = get_dtype_return_type_abbreviation(expr)
expr_renamed = expr.alias(input_column_name '_' dtype_abbreviation)
Where the desired return of get_dtype_return_type_abbreviation
would be for example 'list_of_strings' for an udf
that returns st.ArrayType(st.StringType())
. The alias in this case would be 'some_text_column_list_of_lists'.
CodePudding user response:
You can access the returnType
property of the udf
import pyspark.sql.functions as sf
import pyspark.sql.types as st
from typing import List
@sf.udf(returnType=st.ArrayType(st.StringType()))
def some_function(text: str) -> List[str]:
return text.split(' ')
print(some_function.returnType)
# output
ArrayType(StringType,true)