How can I merge an array of maps of maps into a single map?-CodePudding

Current

 ------- --------------------------------------------------------------------------- 
|ID     |map                                                                        |
 ------- --------------------------------------------------------------------------- 
|105    |[{bia, {4 -> 1}}, {compton, {5 -> 1}}, {alcatraz, {3 -> 6}}]               |
|106    |[{compton, {4 -> 5}}]                                                      |
|107    |[{compton, {5 -> 99}}]                                                     |
|108    |[{bia, {1 -> 5}}, {compton, {1 -> 1}}]                                     |
|101    |[{alcatraz, {1 -> 2}}]                                                     |
|102    |[{alcatraz, {1 -> 2}}]                                                     |
|103    |[{alcatraz, {1 -> 2}}, {alcatraz, {2 -> 2}}, {alcatraz, {3 -> 2}}]         |
|104    |[{alcatraz, {1 -> 4}}, {alcatraz, {2 -> 2}}, {alcatraz, {3 -> 2}}]         |
 ------- ---------------------------------------------------------------------------

Desired


 ------- --------------------------------------------------------------------------- 
|ID     |map                                                                        |
 ------- --------------------------------------------------------------------------- 
|105    |{bia, {4 -> 1}}, {compton, {5 -> 1}}, {alcatraz, {3 -> 6}}                 |
|106    |{compton, {4 -> 5}}                                                        |
|107    |{compton, {5 -> 99}}                                                       |
|108    |{bia, {1 -> 5}}, {compton, {1 -> 1}}                                       |
|101    |{alcatraz, {1 -> 2}}                                                       |
|102    |{alcatraz, {1 -> 2}}                                                       |
|103    |{alcatraz, {1 -> 2, 2 -> 2, 3 -> 2}                                        |
|104    |{alcatraz, {1 -> 4, 2 -> 2, 3 -> 2}                                         |
 ------- ---------------------------------------------------------------------------

I want the final map to be that the first map level is the key for a location (e.g. alcatraz, bia, compton) and the second level is a number representing a group and the ultimate value is a count.

I got the current table by doing something like:

.groupBy(col("ID")).agg(collect_list(map($"LOCATION", map($"GROUP", $"COUNT"))) as "map")

JSON representation of desired format for more clarity

{
    "alcatraz": {
        "1": 100,
        "2": 300
    }, 
    "bia": {
        "2": 767
    },
    "compton": {
        "1": 888,
        "2": 999,
        "3": 1000
    }, 

}

I've seen some other stackoverflow posts for merging simple maps but since it's a map of a map those solutions didn't work.

I've been playing with udf but haven't had luck. Is there an easy way to acomplish my goal in scala?

CodePudding user response：

What I ended up doing is using this defined udf:

    val joinMap = udf { values: Seq[Map[String, Map[String, Int]]] => {
      var newMap: scala.collection.mutable.Map[String, Map[String, Int]] = scala.collection.mutable.Map[String, Map[String, Int]]()
      for (value <- values) {
        for ((k, v) <- value) {
          for ((sub_k, sub_v) <- v) {
            if (newMap.contains(k)) {
              newMap(k)  = (sub_k -> sub_v)
            } else {
              newMap(k) = v
            }
          }
        }
      }
      newMap
    } }

You can use this by doing

.withColumn("map", joinMap(col("map")))

CodePudding user response：

If you have cats in scope, you can do it this way:

def mergeMaps(values: Seq[Map[String, Map[String, Int]]]): Map[String, Map[String, Int]] = {
  import cats.implicits._
  values.foldLeft(Map.empty)(_ | | _)
}

(Note: cats offers even more compact syntax for that, but that would make it even less obvious what's happening under the hood.)

But beware: in case two "inner maps" have the same keys, the above would add the integer values, so Seq(Map("1" -> Map("1" -> 1)), Map("1" -> Map("1" -> 1))) would be merged to Map("1" -> Map("1" -> 2)). If that's a problem and you need the exact same behavior as in your solution, you could modify the solution like this:

def mergeMaps(values: Seq[Map[String, Map[String, Int]]]): Map[String, Map[String, Int]] = {
  import cats.implicits._
  implicit val takeRightIntSemiGroup = new cats.Semigroup[Int] {
    def combine(x: Int, y: Int): Int = y
  }
  values.foldLeft(Map.empty)(_ | | _)
}