Home > Net >  scala to check whether loop through all element in a vector when joining two vectors
scala to check whether loop through all element in a vector when joining two vectors

Time:10-14

I have 2 vectors as below.

val vecBase21=....sortBy(r=>(r._1,r._2))
    vecBase21: scala.collection.immutable.Vector[(String, String, Double)] = Vector((036,20210624 0400,2.0), (036,20210624 0405,2.0), (036,20210624 0410,2.0), (036,20210624 0415,2.0), (036,20210624 0420,2.0),...)

val vecBase22=....sortBy(r=>(r._1,r._2))    
    vecBase22: scala.collection.immutable.Vector[(String, String, Double)] = Vector((036,20210625 0400,2.0), (036,20210625 0405,2.0), (036,20210625 0410,2.0), (036,20210625 0415,2.0), (036,20210625 0420,2.0),...)

Inside, x._1 is ID, x._2 is date time, and x._3 is value.Then I did this to create a 3rd vector as follow.

val vecBase30=vecBase21.map(x=>vecBase22.filter(y=>x._1==y._1 && x._2==y._2).map(y=>(x._1,x._2,x._3,y._3))).flatten

This is literally a join in SQL, a join b on a.id=b.id and a.date_time=b.date_time. It loops in vecBase22 to search one combination of ID and date_time from vecBase21. As each combination is unique in one vector and they are sorted, I want to find out whether the loop in vecBase22 stops once it finds a match or it loops till the end of vecBase22 anyway. I tried this

val vecBase30=vecBase21.map(x=>vecBase22.filter(y=>x._1==y._1 && x._2==y._2).map{y=>
  println("x1=" x._1 " y1=" y._1 " x2=" x._2 " y2=" y._2)
  (x._1,x._2,x._3,y._3)}).flatten

But it apparently gives only matched results. Is there a way of printing all combinations from two vectors that the machine evaluates whether there is a match?

CodePudding user response:

As each combination is unique in one vector and they are sorted, I want to find out whether the loop in vecBase22 stops once it finds a match or it loops till the end of vecBase22 anyway

When you call filter on vecBase22 you loop through every element of that collection to see if it matches the predicate. This returns a new collection and passes it to the map function. If you want to short-circuit the filtering process you could consider using the method collectFirst (Scala 2.12):

def collectFirst[B](pf: PartialFunction[A, B]): Option[B]

Finds the first element of the traversable or iterator for which the given partial function is defined, and applies the partial function to it.

Note: may not terminate for infinite-sized collections.

Note: might return different results for different runs, unless the underlying collection type is ordered.

pf:  the partial function
returns: an option value containing pf applied to the first value for which it is defined, or None if none exists.

Example:

    Seq("a", 1, 5L).collectFirst({ case x: Int => x*10 }) = Some(10)

So you could do something like:

val vecBase30: Vector[(String, String, Double, Double)] = vecBase21
  .flatMap(x => vecBase22.collectFirst {
      case matched: (String, String, Double) if x._1 == matched._1 && x._2 == matched._2 => (x._1, x._2, x._3, matched._3)
    })

CodePudding user response:

First off: yes it loop through all items of vecBase22, for each item of vecBase21. That's what the map and filter do.

If the println doesn't work, it is probably because you are executing you code in an interpreter that lose the std out. Some notebook maybe?

Also, if you want it stop once it find a match, use Seq.find

Finally, you can improve readability. here is a couple of ideas:

  • use case class instead of tuple
  • add space around operator
  • add new lines before each monad operation if it doesn't fit one line
  • use flatMap instead of map followed by flatten
  • add val type (not necessary but it helps reading the code)

That gives:

case class Item(id: String, time: String, value: Double)
case class Joint(id: String, time: String, v1: Double, v2: Double)

val vecBase21: Vector[Item] = ....sortBy(item => (item.id, item.time))
val vecBase22: Vector[Item] = ....sortBy(item => (item.id, item.time))

val vecBase30: Vector[Joint] = vecBase21.flatMap( x =>
  vecBase22
    .filter( y => x.id == y.id && x.time == y.time)
    .map( y => Joint(x.id, x.time, x.value, y.value))
)
  • Related