Home > Software engineering >  How to add column to a DataFrame where value is fetched from a map with other column from row as key
How to add column to a DataFrame where value is fetched from a map with other column from row as key

Time:12-08

I'm new to Spark, and trying to figure out how I can add a column to a DataFrame where its value is fetched from a HashMap, where the key is another value on the same row which where the value is being set.

For example, I have a map defined as follows:

var myMap: Map<Integer,Integer> = generateMap();

I want to add a new column to my DataFrame where its value is fetched from this map, with the key a current column value. A solution might look like this:

val newDataFrame = dataFrame.withColumn("NEW_COLUMN", lit(myMap.get(col("EXISTING_COLUMN"))))

My issue with this code is that using the col function doesn't return a type of Int, like the keys in my HashMap.

Any suggestions?

CodePudding user response:

You need to use UDF.

val mapUDF = udf((i:Int)=>myMap.getOrElse(i,0))
val newDataFrame = dataFrame.withColumn("NEW_COLUMN", mapUDF(col("EXISTING_COLUMN")))

CodePudding user response:

Create a dataframe from the map and join it?

  • Related