Home > Back-end >  Spark Dataframe - using User Defined Function to add a column
Spark Dataframe - using User Defined Function to add a column

Time:05-21

I'm still in a learning stage of python. In the following example (taken from Method 3 of this article), the name of the User Defined Function (UDF) is Total(...,...). But the author is calling it with a name new_f(...,...).

Question: In the code below, how do we know that the function call new_f(...,...) should call the function Total(...,...)? What if there was another UDF function, say, Sum(...,...). In that case, how the code would have known whether call new_f(...,...) means calling Total(...,...) or Sum(...,...)?

# import the functions as F from pyspark.sql
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
  
# define the sum_col
def Total(Course_Fees, Discount):
    res = Course_Fees - Discount
    return res
  
# integer datatype is defined
new_f = F.udf(Total, IntegerType())
  
# calling and creating the new
# col as udf_method_sum
new_df = df.withColumn(
  "Total_price", new_f("Course_Fees", "Discount"))
  
# Showing the Dataframe
new_df.show()

CodePudding user response:

new_f = F.udf(Total, IntegerType()) 

assigns the name new_f to that user defined function

  • Related