I'd like to parse some very simple text for example,
"abcxyzzzz\nhello\n"
into ["abcxyzzz", "hello"] :: String
.
Not looking for a simpler function to do this (like words
) as I need to parse something more complex and I'm just laying the foundations here.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
module RgParse where
import Data.Text (Text)
import Text.Megaparsec
import Text.Megaparsec.Char
data SimpleData = SimpleData String deriving (Eq, Show, Ord)
data SimpleData' = SimpleData' [String] deriving (Eq, Show, Ord)
instance ShowErrorComponent SimpleData where
showErrorComponent = show
instance ShowErrorComponent String where
showErrorComponent = show
simple :: Parsec String Text SimpleData
simple = do
x <- many (noneOf (Just '\n'))
pure $ SimpleData x
simple' :: Parsec String Text SimpleData'
simple' = do
x <- many (many (noneOf (Just '\n')))
pure $ SimpleData' x
example2 :: Text
example2 = "abcxyzzzz\nhello\n"
main :: IO ()
main = do
print "Simple:"
case parse simple "<stdin>" example2 of
Left bundle -> putStr (errorBundlePretty bundle)
Right result -> print result
print "Simple':"
case parse simple' "<stdin>" example2 of
Left bundle -> putStr (errorBundlePretty bundle)
Right result -> print result
print "done.."
The above unfortunately runs into an infinite loop / space leak upon entering simple'
as it outputs the following:
Hello, Haskell!
[]
"Simple:"
SimpleData "abcxyzzzz"
"Simple':"
Using megaparsec-7.0.5 (not the latest 9.x.x).
Is there possibly a simpler approach to getting multiple lines?
CodePudding user response:
Apply many
only to a parser that either consumes at least one token (here, one Char
) or fails. That's because many
works by running its argument until it fails. many x
may consume zero tokens, so many (many x)
breaks this requirement.
Note that a line should at least involve a terminating newline. That allows that requirement to be fulfilled.
oneline :: Parsec String Text String
oneline = many (noneOf (Just '\n')) <* single '\n'
manylines :: Parsec String Text [String]
manylines = many oneline
simple :: Parsec String Text SimpleData
simple = do
x <- oneline
pure $ SimpleData x
simple' :: Parsec String Text SimpleData'
simple' = do
x <- manylines
pure $ SimpleData' x
A looser requirement for many p
is that any repetition of p
must fail after a finite number of iterations (and here p = many x
never fails), so p
might consume nothing in some steps, but then it must be stateful so that after some repetitions it eventually consumes something or fails. But the above approximation is a pretty good rule of thumb in practice.