I'm migrating a project from Scala 2.12.1 to 2.13.6, and find that SeqView#flatMap
now returns a View
, which doesn't have a distinct
method. I thus have one bit of code that does not compile anymore:
val nodes = debts.view
.flatMap { case Debt(from, to, _) => List(from, to) }
.distinct
.map(name => (name, new Node(name)))
.toMap
There's a dumb way to fix it, by converting the view to a seq and then back to a view:
val nodes = debts.view
.flatMap { case Debt(from, to, _) => List(from, to) }.toSeq.view
.distinct
.map(name => (name, new Node(name)))
.toMap
However, this is obviously not great because it forces the view to be collected, and also it's just super inelegant to have to go back-and-forth between types. I found another way to fix it, with is to use a LazyList
:
val nodes = debts.to(LazyList)
.flatMap { case Debt(from, to, _) => List(from, to) }
.distinct
.map(name => (name, new Node(name)))
.toMap
Now that's what I want, it basically behaves like a Java stream. Sure, some operations have O(n)
memory usage like distinct
, but at least all operations after it get to be streamed, without reconstructing the data structure.
With this, it gets me thinking about why we should ever need a view, given that they're much less powerful than before (even if I can believe 2.13 has fixed some other issues this power was introducing). I looked for the answer and found hints, but nothing that I found comprehensive enough. Below is my research:
- Description of views in 2.13
- StackOverflow: What is the difference between List.view and LazyList?
- Another comparison on an external website
It might be me, but even after reading these references, I don't find a clear upside in using views, for most if not all use cases. Anyone more enlightened than me?
CodePudding user response:
There are actually 3 basic possibilities for lazy sequences in Scala 2.13: View, Iterator and LazyList.
View is the simplest lazy sequence with very little additional costs. It's good to use by default in general case to avoid allocations for intermediate results when working with large sequences.
It's possible to traverse the View several times (using foreach, foldLeft, toMap, etc.). Transformations (map, flatMap, filter, etc.) will be executed separately for every traversal. So care has to be applied either to avoid time-consuming transformations, or to traverse the View only once.
Iterator can be traversed only once. It's similar to Java Streams or Python generators. Most transformation methods on Iterator require that you only use the returned Iterator and discard the original object.
It's also fast like a View and supports more operations, including distinct.
LazyList is basically a real strict structure, which can be expanded automatically on the fly. LazyList memoizes all the generated elements. If you have a val
with a LazyList, the memory will be allocated for all the generated elements. But if you traverse it on the fly and don't store in a val
, the garbage collector can clean up the traversed elements.
Stream in Scala 2.12 was considerably slower than Views or Iterators. I'm not sure if this applies to LazyList in Scala 2.13.
So every lazy sequence has some caveat:
- View can execute transformations multiple times.
- Iterator can be consumed only once.
- LazyList can allocate the memory for all the sequence elements.
In your use case I believe, it's Iterator that's the most appropriate:
val nodes = debts.iterator
.flatMap { case Debt(from, to, _) => List(from, to) }
.distinct
.map(name => (name, new Node(name)))
.toMap