In my scala program, I have a dataframe df
with two columns a
and b
(both of type Int
). Aside I have a previously defined object obj
with some methods and attributes. Here I want to add a new column to my dataframe df
using the current values of the dataframe and attributes from obj
.
For example, if I have the dataframe below :
--- ---
| a | b |
--- ---
| 1 | 0 |
| 4 | 8 |
| 2 | 5 |
--- ---
and if obj
has an attribute num: Int = 10
as well as a method f(a: Int, b: Int): Int = {a b - this.num}
, I want to use f
to create the new column c
like so :
--- --- -----
| a | b | c |
--- --- -----
| 1 | 0 | -9 |
| 4 | 8 | 2 |
| 2 | 5 | -3 |
--- --- -----
So the idea is: for each row, take the values of columns a
and b
and call the method f
on obj
with a
and b
as arguments too get the value that we then store in the corresponding row of the new column c
. I tried to do something like this :
df = df.withColumn("c", obj.f(col("a"), col("b")))
but obviously it doesn't work as col()
return a column and not the elements of that column. I also tried a foreach on a new column filled with 0s to fill the column row by row, but it didn't work as well.
Do you see how I can achieve this in Scala ?
Thank you.
CodePudding user response:
The same result can be achieved without function, performance will be better:
val num = 10
df.withColumn("c", col("a") col("b") - lit(num))
Version with UDF:
val num = 10
val f = (a: Int, b: Int) => {a b - num}
val fUDF = udf(f)
df.withColumn("c", fUDF(col("a"), col("b")))