I'm new to Clojure and am having trouble figuring out how to avoid stack overflows in certain situations. One such situation came up while trying to port a parsing project to Clojure with a parser combinator library I found called kern.
Kern defines a recursive implementation for a "many-till" parser: source
This works fine for small inputs:
(def input-text "The blue {cat} and the red {dog} became best friends with the white {wolf} END {not included}")
(def between-brackets "parser that grabs all text between brackets"
(between (sym* \{) (sym* \}) (< > (many (none-of* "}")))))
(def parse-enclosed-words "parser that attempts to grab text between brackets,
skips a character when it can't, and halts when it hits the string END"
(many-till (<|> between-brackets (skip any-char)) (token* "END")))
(filter #(some? %) (value parse-enclosed-words input-text)) ;; => ("cat" "dog" "wolf")
Unfortunately, the parser encounters stack overflows as input strings grow:
(def file-input-text (slurp (io/resource "[input-text-20x-repeated.txt][2]") ))
(filter #(some? %) (value parse-enclosed-words file-input-text)) ;; => Unhandled java.lang.StackOverflowError
From what I can tell by reading online, this is probably due to the function using stack-consuming recursion. I played around with some attempts to re-write the function using the "recur" keyword, but since the recursive call is not in tail position this didn't seem to work.
How can many-till be modified to avoid stack overflow?
CodePudding user response:
I don't think you can do it with the abstractions the library provides. This is a very natural definition of many-till
, and would work fine in Parsec, the library that Kern is obviously inspired by. But Clojure doesn't have Haskell's lazy evaluation and automatic trampolining, so the nested lambdas that many-till
builds inevitably consume unbounded stack. You would need an implementation more like its implementation of many
, which builds a parser by hand out of a function. I would include its source below, but I don't own it and so I don't think I am authorized to give Stack Overflow a CC BY-SA 4.0 license to it, as posting it would do. Instead, here is a link to its source.