Asking for help:
Data: map (nullable = true)
|-- key: string
|-- value: map (valueContainsNull = true)
| |-- key : string
| |-- value : string (valueContainsNull = true) reffer you
I reffer you below link Passing a map with struct-type key into a Spark UDF and created one udf to concat string:
val myUDF1 = udf((inputMapping:Map[String,Row]) => inputMapping
.map{case(key,value)=>(key, (value.getString(0),value.getString(1)))}
.map{ case (key,(i1,i2))=> (key,(i1 i2)) }
)
df.withColumn("udfResult", myUDF($"Data")).show()
Same thing I want to do but instead of adding integer, I want to delete key from the values which is of string type. how can I Archive same I tried this but getting error Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.spark.sql.Row (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.spark.sql.Row is in unnamed module of loader 'app')
I want to delete specific key from the vale mapType nested column in outer map:
Data: map (nullable = true)
|-- key: string
|--** value: map (valueContainsNull = true)**
| |-- key : string
| |-- value : string (valueContainsNull = true) reffer you
CodePudding user response:
Welcome to StackOverflow. Maybe this function can help:
def extractNestedKey(key: String, nestedKey: String) = udf { in: Map[String, Map[String, String]] => in(key) - nestedKey }
Consider a simple dataframe(I create it from dataset because is very simple):
spark.createDataset(Seq(Map("key" -> Map("key" -> "value", "key2" -> "value2")))).withColumnRenamed("value", "Data")
It is:
---------------------------------------
|Data |
---------------------------------------
|{key -> {key -> value, key2 -> value2}}|
---------------------------------------
And applying the udf:
ds.withColumn("Data2", extractNestedKey("key", "key2")($"Data"))
it creates the column without the nested key:
--------------------------------------- --------------
|Data |Data2 |
--------------------------------------- --------------
|{key -> {key -> value, key2 -> value2}}|{key -> value}|
--------------------------------------- --------------
CodePudding user response:
you don't need to use UDF, cause it's expensive, you can just use the map method, I have used Dataset here, you can use data frame
case class Nst(key: String, value: Map[String, Map[String, String]])
val removeList = List("key222")
val ds = Seq( Nst("key1", Map("key11" -> Map("key111" -> "111", "key222" -> "222")))).toDS()
val result = ds.map(nst => nst.copy(value = nst.value.mapValues(nestedMap => nestedMap -- removeList) ))
result.show(false)
---- --------------------------
|key |value |
---- --------------------------
|key1|{key11 -> {key111 -> 111}}|
---- --------------------------