I have list which contains the names of column. I need to concat this columns and then create and md5
and append to the dataframe.
example
I have table with fname
, lname
, address
. my resulting dataframe should look like. fname
, lname
, address
and md5(concat_ws(",",fname, lname))
my list would contains fname
and lname
.
code
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
val businessCols = List("fname", "lname")
val df = spark.table(s"{databaseName}.${databaseName}")
val new_df = df.
withColumn("concatenated_cols",concat_ws(",",$"businessCols": _*)).
withColumn("md5_hash", md5($"concatenated_cols"))
Error
found : org.apache.spark.sql.ColumnName
required: Seq[org.apache.spark.sql.Column]
withColumn("concatenated_cols",concat_ws(",",$"businessCols": _*)).
CodePudding user response:
import java.security.MessageDigest
import spark.implicits._
import org.apache.spark.sql.functions.concat_ws
case class Test(fname: String, lname: String, address: String)
def md5(s: String): Array[Byte] = {
MessageDigest.getInstance("MD5").digest(s.getBytes)
}
val df1 = spark.createDataset(Seq(Test("1", "1", "2")))
df1.withColumn("md5(concat_ws(\",\",fname, lname))", lit(md5(concat_ws(",", $"fname", $"lname")
.toString()))).show(false)
CodePudding user response:
I was able to fix this by creating a new list
import org.apache.spark.sql.Column
val cols = businessCols.foldLeft(List[Column]()){(acc,l) => acc List(col(l))}