Home > other >  Deleteing key from nested map
Deleteing key from nested map

Time:11-24

Asking for help:

Data: map (nullable = true)
    |-- key: string
    |-- value: map (valueContainsNull = true)
    |    |-- key : string
    |    |-- value : string (valueContainsNull = true)   reffer you 

I reffer you below link Passing a map with struct-type key into a Spark UDF and created one udf to concat string:

val myUDF1 = udf((inputMapping:Map[String,Row]) => inputMapping
     .map{case(key,value)=>(key, (value.getString(0),value.getString(1)))}
     .map{ case (key,(i1,i2))=> (key,(i1    i2)) }
     )


df.withColumn("udfResult", myUDF($"Data")).show()

Same thing I want to do but instead of adding integer, I want to delete key from the values which is of string type. how can I Archive same I tried this but getting error Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.spark.sql.Row (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.spark.sql.Row is in unnamed module of loader 'app')

I want to delete specific key from the vale mapType nested column in outer map:

Data: map (nullable = true)
    |-- key: string
    |--** value: map (valueContainsNull = true)**
    |    |-- key : string
    |    |-- value : string (valueContainsNull = true)   reffer you 

CodePudding user response:

Welcome to StackOverflow. Maybe this function can help:

def extractNestedKey(key: String, nestedKey: String) = udf { in: Map[String, Map[String, String]] => in(key) - nestedKey }

Consider a simple dataframe(I create it from dataset because is very simple):

spark.createDataset(Seq(Map("key" -> Map("key" -> "value", "key2" -> "value2")))).withColumnRenamed("value", "Data")

It is:

 --------------------------------------- 
|Data                                   |
 --------------------------------------- 
|{key -> {key -> value, key2 -> value2}}|
 --------------------------------------- 

And applying the udf:

ds.withColumn("Data2", extractNestedKey("key", "key2")($"Data"))

it creates the column without the nested key:

 --------------------------------------- -------------- 
|Data                                   |Data2         |
 --------------------------------------- -------------- 
|{key -> {key -> value, key2 -> value2}}|{key -> value}|
 --------------------------------------- -------------- 

CodePudding user response:

you don't need to use UDF, cause it's expensive, you can just use the map method, I have used Dataset here, you can use data frame

case class Nst(key: String, value: Map[String, Map[String, String]])

val removeList = List("key222")
val ds = Seq( Nst("key1", Map("key11" -> Map("key111" -> "111", "key222" -> "222")))).toDS()

val result = ds.map(nst => nst.copy(value = nst.value.mapValues(nestedMap => nestedMap -- removeList)  ))

result.show(false)
 ---- -------------------------- 
|key |value                     |
 ---- -------------------------- 
|key1|{key11 -> {key111 -> 111}}|
 ---- -------------------------- 
  • Related