I have an array with millions of tuple elements like:
var arr: ArrayBuffer[(String, String)] = ArrayBuffer[(String, String)]()
arr = (("Hamburg", "Street1"))
arr = (("Hamburg", "Street2"))
arr = (("Hamburg", "Street1")) // duplicate - remove
arr = (("Berlin", "StreetA"))
arr = (("Berlin", "StreetZ"))
arr = (("Berlin", "StreetZ")) // duplicate - remove
arr = (("Berlin", "StreetA")) // duplicate - remove
I would now like to have those duplicates within that array removed, where City AND Street are equal. Something like:
arr.distinctBy(_._1&_._2) // doesn't work just for illustration
Is there a simple solution to it, how this can be done to get an output like:
(("Hamburg", "Street1"))
(("Hamburg", "Street2"))
(("Berlin", "StreetA"))
(("Berlin", "StreetZ"))
CodePudding user response:
Since equals
and hashCode
are overridden for tuples you can use distinct
which is effectively is distinctBy(identity)
:
val result = arr.distinct
CodePudding user response:
Calling arr.toSet
on your array will do what you require:
arr.toSet
res34: Set[(String, String)] = Set(
("Hamburg", "Street1"),
("Hamburg", "Street2"),
("Berlin", "StreetA"),
("Berlin", "StreetZ")
)
Tuples are case classes, so are provided with equals
and hashCode
methods to support comparisons.
If your use case is to ensure a collection contains no duplicates, you should generally use a set
. This allows other readers of your code to infer that there are no duplicates in the collection.