Home > Software engineering >  How do I access dataframe column value within udf via scala
How do I access dataframe column value within udf via scala

Time:11-23

I am attempting to add a column to a dataframe, using a value from a specific column—-let’s assume it’s an id—-to look up its actual value from another df.

So I set up a lookup def

def lookup(id:String): String {
    return lookupdf.select(“value”)
    .where(s”id = ‘$id’”).as[String].first 

}

The lookup def works if I test it on its own by passing an id string, it returns the corresponding value.

But I’m having a hard time finding a way to use it within the “withColumn” function.

dataDf
.withColumn(“lookupVal”, lit(lookup(col(“someId”))))

It properly complains that I’m passing in a column, instead of the expected string, the question is how do I give it the actual value from that column?

CodePudding user response:

You cannot access another dataframe from withColumn . Think of withColumn can only access data at a single record level of the dataDf

Please use a join like

val resultDf = lookupDf.select(“value”,"id")
.join(dataDf, lookupDf("id") == dataDf("id"), "right")
  • Related