How to get mapped value for any value in spark?-CodePudding

I am working with a nested json for creating etl. I am using mainly explode and withColumn to flatten the nested json. One of the key in the json has value for state name. So I want to take that value and convert it into abbreviated state name. For example: Texas should be as TX.

.withColumn("NewColName", $"col.someKey.keyStoringValue")

This is helping me to get the associated value from the required key in the json. So if ...someKey.keyStoringValue's value is Texas, I get Texas from my code. But I want it to be TX. How can I do that? I have tried to create a function like;

def getStateName(state: String): String = {
  val maps = Map("Texas"->"TX", "Virginia"->"VA")
  maps.apply(state)
}

.....
.....
.withColumn("NewColName", getStateName($"col.someKey.keyStoringValue"))
.....
.....

I am getting this error;

error: type mismatch;
found: org.apache.spark.sql.ColumnName
required: String

CodePudding user response：

You can´t apply a common Scala function when you use DataFrame/Spark SQL api. For using this kind of mapping function you need to create an UDF:

import org.apache.spark.sql.functions.udf

val maps = Map("Texas"->"TX", "Virginia"->"VA")
val fun = udf { in: String => maps(in)}

...

.withColumn("NewColName", fun($"col.someKey.keyStoringValue"))

CodePudding user response：

The method you are looking for is when (see documentation), which allows you to return a value based on a condition. You'd do something like:

df.withColumn(
  "NewColName", 
  when($"col.someKey.keyStoringValue" === "Texas", "TX")
  .when($"col.someKey.keyStoringValue" === "Virginia", "VA")
  .otherwise("Unknown")
)