I'm still in a learning stage of python
. In the following example (taken from Method 3 of this article), the name of the User Defined Function (UDF) is Total(...,...)
. But the author is calling it with a name new_f(...,...)
.
Question: In the code below, how do we know that the function call new_f(...,...)
should call the function Total(...,...)
? What if there was another UDF function, say, Sum(...,...)
. In that case, how the code would have known whether call new_f(...,...)
means calling Total(...,...)
or Sum(...,...)
?
# import the functions as F from pyspark.sql
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
# define the sum_col
def Total(Course_Fees, Discount):
res = Course_Fees - Discount
return res
# integer datatype is defined
new_f = F.udf(Total, IntegerType())
# calling and creating the new
# col as udf_method_sum
new_df = df.withColumn(
"Total_price", new_f("Course_Fees", "Discount"))
# Showing the Dataframe
new_df.show()
CodePudding user response:
new_f = F.udf(Total, IntegerType())
assigns the name new_f to that user defined function