Home > database >  Filtering a map of type [String, Seq[Int]] by value
Filtering a map of type [String, Seq[Int]] by value

Time:07-09

I want to filter a map of type [String, Seq[Int]] by its value (the Seq). As an example:

val data: Map[String, Seq[Int]] = Map("a" -> Seq(1,2,3,4,5,6), "b" -> Seq(2,3,4,5,6,7), "c" -> Seq(3,4,5,6,7,8), "d"->Seq(1))

for this data, in case I need to get specific values from Seq (if present) and no entry in map if not present, I should be able to get that. If I am looking for subarray (4,5) in the values of the map, the output should be

Map("a" -> Seq(4,5), "b" -> Seq(4,5), "c" -> Seq(4,5))

I tried to use the following lines of code, but they don't work as expected:

val filtered_data_1 = data.filter( { case (_, value) => (value.filter(v => v.equals((4,5)))).length > 0 })

val filtered_data_2 = data.filter(d => (d._2.filter(v => v.equals((4,5)))).length > 0)

Please let me know what is the issue with my code and how to correct it. Thank you!

CodePudding user response:

You were close ,this can probably be optimized in a single map function , but this gives you the idea

scala> val data: Map[String, Seq[Int]] = Map("a" -> Seq(1,2,3,4,5,6), "b" -> Seq(2,3,4,5,6,7), "c" -> Seq(3,4,5,6,7,8), "d"->Seq(1))
data: Map[String,Seq[Int]] = Map(a -> List(1, 2, 3, 4, 5, 6), b -> List(2, 3, 4, 5, 6, 7), c -> List(3, 4, 5, 6, 7, 8), d -> List(1))

val subSeq = Seq(4,5)
subSeq: Seq[Int] = List(4, 5)

val filtered_data_1 = data.filter( { case (_, value) => subSeq.forall(value.contains)}).map( { case (k, v) => (k,subSeq)}).toMap
filtered_data_1: scala.collection.immutable.Map[String,Seq[Int]] = Map(a -> List(4, 5), b -> List(4, 5), c -> List(4, 5))

CodePudding user response:

As the other answer mentioned, there is a shorter/more convenient approach to this. Most of the times in scala collections when you do:

myCollection.filter(p).map(f) // chaining filter and map

It's better to use collect in this cases, the code will be more readable/easier to follow, and depending on the implementation of collections, execution time would be less than or equal to the first approach (meaning they're equal).

myCollection.collect { case elem if p(elem) => f(elem) }

So in your case:

data.collect {
    case (key, value) if subSeq forall value.contains =>
      key -> subSeq
  } 
  • Related