Home > Net >  Filter a Scala Seq[(String, String)] using a Seq[String]
Filter a Scala Seq[(String, String)] using a Seq[String]

Time:08-24

I have this Seq[(String, String)] :

val tupleSeq: Seq[(String, String)] = Seq(
  ("aaa", "A_A_A"),
  ("bbb", "B_B_B"),
  ("ccc", "C_C_C")
)

I want to use the given seqA on tupleSeq:

val seqA: Seq[String] = Seq("aaa", "bbb")

In order to obtain :

val seqB: Seq[String] = Seq("A_A_A", "B_B_B")

Any ideas ?

CodePudding user response:

One approach is to use the data unaltered.

// The size of `data` is M
// The size of `query` is N

val data: Seq[(String, String)] = Seq(
  ("aaa", "A_A_A"),
  ("bbb", "B_B_B"),
  ("ccc", "C_C_C")
)

val query: Seq[String] = Seq("aaa", "bbb")

// Use the data as is
// O(M * N)

for {
  (key, value) <- data
  lookup <- query
  if key == lookup
} yield value

The problem with this approach is that the overall complexity is O(M * N), where M and N are the sizes of the data and query collections. This might be completely acceptable if either M or N are known to be very small and can be further improved in practical terms by making use of functions that can terminate early (like find, exemplified in another answer).

If M and N are reasonably large, you might want to spend the time necessary to convert them into an appropriate data structure (which consumes time and space in a way which is linear to the size of the collection).

Depending on which size you expect to be larger you might want to either turn the data into a map and look up the relevant keys or turn the query into a set and iterate each key in the map to find which is relevant.

I would expect the data to be queried in most cases to be larger than the query, so probably you may want to turn the data into a map. Keeping the map around would also allow you to query it multiple times without suffering from the time to turn it into a more appropriate structure for querying.

// Turn the query into a set and iterate the data
// O(M)

val lookups = query.toSet
data.collect {
  case (key, value) if lookups.contains(key) => value
}

// Turn the data into a map and iterate the query
// O(N)

val map = data.toMap
query.collect(map)

You can play around with this code here on Scastie.

CodePudding user response:

val tupleSeq: Seq[(String, String)] = Seq(
  ("aaa", "A_A_A"),
  ("bbb", "B_B_B"),
  ("ccc", "C_C_C")
)

val seqA: Seq[String] = Seq("aaa", "bbb")

// List(A_A_A, B_B_B)
val seqB = for {
      key <- seqA
      value <- tupleSeq.find(_._1 == key).map(_._2)
    } yield value

CodePudding user response:

You can try something like this:

val seqB = tupleSeq.filter{x => seqA.contains(x._1)}.map(x => x._2)

It filters the sequence and keeps the tuples where the first value is part of your second sequence, and then maps the tuples to the second value.

seqB.foreach(println) then outputs this:

A_A_A
B_B_B

CodePudding user response:

Your tupleSeq naturally looks like a Map of key-to-value pairs, so you should treat it like one. The code becomes very simple with this observation:

  val myMap = tupleSeq.toMap
  val seqB  = seqA.collect(myMap) // List(A_A_A, B_B_B)

For additional space complexity, you get O(1) amortized time complexity for your query, which is fine for your use case.

Note the use of collect instead of map because it discards keys that do not have a mapping value in your Map.

  • Related