Current
------- ---------------------------------------------------------------------------
|ID |map |
------- ---------------------------------------------------------------------------
|105 |[{bia, {4 -> 1}}, {compton, {5 -> 1}}, {alcatraz, {3 -> 6}}] |
|106 |[{compton, {4 -> 5}}] |
|107 |[{compton, {5 -> 99}}] |
|108 |[{bia, {1 -> 5}}, {compton, {1 -> 1}}] |
|101 |[{alcatraz, {1 -> 2}}] |
|102 |[{alcatraz, {1 -> 2}}] |
|103 |[{alcatraz, {1 -> 2}}, {alcatraz, {2 -> 2}}, {alcatraz, {3 -> 2}}] |
|104 |[{alcatraz, {1 -> 4}}, {alcatraz, {2 -> 2}}, {alcatraz, {3 -> 2}}] |
------- ---------------------------------------------------------------------------
Desired
------- ---------------------------------------------------------------------------
|ID |map |
------- ---------------------------------------------------------------------------
|105 |{bia, {4 -> 1}}, {compton, {5 -> 1}}, {alcatraz, {3 -> 6}} |
|106 |{compton, {4 -> 5}} |
|107 |{compton, {5 -> 99}} |
|108 |{bia, {1 -> 5}}, {compton, {1 -> 1}} |
|101 |{alcatraz, {1 -> 2}} |
|102 |{alcatraz, {1 -> 2}} |
|103 |{alcatraz, {1 -> 2, 2 -> 2, 3 -> 2} |
|104 |{alcatraz, {1 -> 4, 2 -> 2, 3 -> 2} |
------- ---------------------------------------------------------------------------
I want the final map to be that the first map level is the key for a location (e.g. alcatraz
, bia
, compton
) and the second level is a number representing a group and the ultimate value is a count.
I got the current table by doing something like:
.groupBy(col("ID")).agg(collect_list(map($"LOCATION", map($"GROUP", $"COUNT"))) as "map")
JSON representation of desired format for more clarity
{
"alcatraz": {
"1": 100,
"2": 300
},
"bia": {
"2": 767
},
"compton": {
"1": 888,
"2": 999,
"3": 1000
},
}
I've seen some other stackoverflow posts for merging simple maps but since it's a map of a map those solutions didn't work.
I've been playing with udf
but haven't had luck. Is there an easy way to acomplish my goal in scala?
CodePudding user response:
What I ended up doing is using this defined udf:
val joinMap = udf { values: Seq[Map[String, Map[String, Int]]] => {
var newMap: scala.collection.mutable.Map[String, Map[String, Int]] = scala.collection.mutable.Map[String, Map[String, Int]]()
for (value <- values) {
for ((k, v) <- value) {
for ((sub_k, sub_v) <- v) {
if (newMap.contains(k)) {
newMap(k) = (sub_k -> sub_v)
} else {
newMap(k) = v
}
}
}
}
newMap
} }
You can use this by doing
.withColumn("map", joinMap(col("map")))
CodePudding user response:
If you have cats in scope, you can do it this way:
def mergeMaps(values: Seq[Map[String, Map[String, Int]]]): Map[String, Map[String, Int]] = {
import cats.implicits._
values.foldLeft(Map.empty)(_ | | _)
}
(Note: cats offers even more compact syntax for that, but that would make it even less obvious what's happening under the hood.)
But beware: in case two "inner maps" have the same keys, the above would add the integer values, so Seq(Map("1" -> Map("1" -> 1)), Map("1" -> Map("1" -> 1)))
would be merged to Map("1" -> Map("1" -> 2))
. If that's a problem and you need the exact same behavior as in your solution, you could modify the solution like this:
def mergeMaps(values: Seq[Map[String, Map[String, Int]]]): Map[String, Map[String, Int]] = {
import cats.implicits._
implicit val takeRightIntSemiGroup = new cats.Semigroup[Int] {
def combine(x: Int, y: Int): Int = y
}
values.foldLeft(Map.empty)(_ | | _)
}