Home > Software design >  Scala / How to remove duplicates of an array of tuples based on two values?
Scala / How to remove duplicates of an array of tuples based on two values?

Time:02-24

I have an array with millions of tuple elements like:

var arr: ArrayBuffer[(String, String)] = ArrayBuffer[(String, String)]()
arr  = (("Hamburg", "Street1"))
arr  = (("Hamburg", "Street2"))
arr  = (("Hamburg", "Street1")) // duplicate - remove
arr  = (("Berlin",  "StreetA"))
arr  = (("Berlin",  "StreetZ"))
arr  = (("Berlin",  "StreetZ")) // duplicate - remove
arr  = (("Berlin",  "StreetA")) // duplicate - remove

I would now like to have those duplicates within that array removed, where City AND Street are equal. Something like:

arr.distinctBy(_._1&_._2) // doesn't work just for illustration

Is there a simple solution to it, how this can be done to get an output like:

(("Hamburg", "Street1"))
(("Hamburg", "Street2"))
(("Berlin",  "StreetA"))
(("Berlin",  "StreetZ"))

CodePudding user response:

Since equals and hashCode are overridden for tuples you can use distinct which is effectively is distinctBy(identity):

val result = arr.distinct

CodePudding user response:

Calling arr.toSet on your array will do what you require:

arr.toSet 
res34: Set[(String, String)] = Set(
  ("Hamburg", "Street1"),
  ("Hamburg", "Street2"),
  ("Berlin", "StreetA"),
  ("Berlin", "StreetZ")
)

Tuples are case classes, so are provided with equals and hashCode methods to support comparisons.

If your use case is to ensure a collection contains no duplicates, you should generally use a set. This allows other readers of your code to infer that there are no duplicates in the collection.

  • Related