I am trying to convert java.util.StringTokenizer
to Scala's Iterator
and the following approach fails:
def toIterator(st: StringTokenizer): Iterator[String] =
Iterator.continually(st.nextToken()).takeWhile(_ => st.hasMoreTokens()))
But this works:
def toIterator(st: StringTokenizer): Iterator[String] =
Iterator.fill(st.countTokens())(st.nextToken())
You can see this in Scala console:
scala> Iterator("a b", "c d").map(new java.util.StringTokenizer(_)).flatMap(st => Iterator.continually(st.nextToken()).takeWhile(_ => st.hasMoreTokens())).toList
res1: List[String] = List(a, c)
scala> Iterator("a b", "c d").map(new java.util.StringTokenizer(_)).flatMap(st => Iterator.fill(st.countTokens())(st.nextToken())).toList
res2: List[String] = List(a, b, c, d)
As you can see res1
is incorrect and res2
is correct. What am I doing wrong? The first version should work and is preferred since it is 2x faster than second approach since it does not scan the string twice
CodePudding user response:
takeWhile
is not intended to be used statefully. It should take a pure function that determines, solely based on the input, whether or not to continue.
Specifically, the iterator must produce the value before the takeWhile
predicate gets called. Even though your function ignores the takeWhile
argument, it still gets evaluated. So nextToken
gets called and then we check for more tokens.
To be perfectly precise, in your "a b"
case,
- First, we call
nextToken
, which is whatIterator.continually
does. There's a next token, so it returns"a"
. - Now, to determine if we should include the next token, we call your predicate with
"a"
as argument. Your predicate ignores"a"
and callshasMoreTokens
. Our tokenizer has more tokens (namely,"b"
), so it returns true. Continue. - Now we call
nextToken
again. This returns"b"
. - We need to determine if we should include this in our result, so our
takeWhile
predicate runs with"b"
as argument. OurtakeWhile
predicate ignores its argument and callshasMoreTokens
. We have no more tokens anymore, so this returnsfalse
. We should not include this element. takeWhile
has returned false, so we stop at the last element for which it returned true. Our resulting list isList("a")
.
Since we're abusing a pure functional technique like takeWhile
to be stateful, we get unintuitive results.
As much as it looks snazzy and clever to have a one-line solution, what you have is a stateful, imperative object that you want to adapt to the Iterator
interface. Hiding that statefulness in a bunch of pure function calls is not a good idea, so we should just write our own subclass of Iterator
and do it properly.
import java.util.StringTokenizer
final class StringTokenizerIterator(
private val tokenizer: StringTokenizer
) extends Iterator[String] {
def hasNext: Boolean = tokenizer.hasMoreTokens
def next(): String = tokenizer.nextToken()
}
object Example {
def toIterator(st: StringTokenizer): Iterator[String] =
new StringTokenizerIterator(st)
def main(args: Array[String]) = {
println(Iterator("a b", "c d")
.map(new java.util.StringTokenizer(_))
.flatMap(toIterator(_))
.toList)
}
}
We're doing the same work you were doing calling the appropriate StringTokenizer
functions, but we're doing it in a full class that encapsulates the state, rather than pretending the stateful part is not there. It's longer code, yes, but it should be longer. We don't want the messy parts of it to go unnoticed.