Home > other >  Spark Dataframe, add new column with function using other columns
Spark Dataframe, add new column with function using other columns

Time:10-21

In my scala program, I have a dataframe df with two columns a and b (both of type Int). Aside I have a previously defined object obj with some methods and attributes. Here I want to add a new column to my dataframe df using the current values of the dataframe and attributes from obj.

For example, if I have the dataframe below :

 --- --- 
| a | b |
 --- --- 
| 1 | 0 |
| 4 | 8 |
| 2 | 5 |
 --- --- 

and if obj has an attribute num: Int = 10 as well as a method f(a: Int, b: Int): Int = {a b - this.num}, I want to use f to create the new column c like so :

 --- --- ----- 
| a | b |  c  |
 --- --- ----- 
| 1 | 0 | -9  |
| 4 | 8 |  2  |
| 2 | 5 | -3  |
 --- --- ----- 

So the idea is: for each row, take the values of columns a and b and call the method f on obj with a and b as arguments too get the value that we then store in the corresponding row of the new column c. I tried to do something like this :

df = df.withColumn("c", obj.f(col("a"), col("b")))

but obviously it doesn't work as col() return a column and not the elements of that column. I also tried a foreach on a new column filled with 0s to fill the column row by row, but it didn't work as well.

Do you see how I can achieve this in Scala ?

Thank you.

CodePudding user response:

The same result can be achieved without function, performance will be better:

val num = 10
df.withColumn("c", col("a")   col("b") - lit(num))

Version with UDF:

val num = 10
val f = (a: Int, b: Int) => {a   b - num}
val fUDF = udf(f)
df.withColumn("c", fUDF(col("a"), col("b")))
  • Related