Home > Software design >  How do I convert a MegaParsec "ParseErrorBundle" into a list of "SourcePos" and
How do I convert a MegaParsec "ParseErrorBundle" into a list of "SourcePos" and

Time:11-28

Using the MegaParsec parse function, I'm able to run a parser, and get a ParseErrorBundle if it fails.

I know that I'm able to pretty print the ParseErrorBundle, and get an error message for the entire parse failure, which will include the line and character numbers, using errorBundlePretty.

I also know that I'm able to get a list of ParseError's from a ParseErrorBundle, using bundleErrors. And that I can pretty print these with either parseErrorPretty or parseErrorTextPretty.

I want to be able to run a parser, and if it fails, get a list of (SourcePos, Text), so that I know both the individual error messages, and the location of each error. I can't figure out an elegant way to do this. While I could in theory crib fairly heavily from the source code to errorBundlePretty, I feel like folding over the errors and using reachOffset to advance the PosState can't be the easiest way to go about this?.

CodePudding user response:

Note that, if you're using megaparsec >= 7.0.0, I think you're supposed to use attachSourcePos for the traversal. It returns a NonEmpty of (ParseError, SourcePos) pairs. I think it would look like:

import qualified Text.Megaparsec as MP
import qualified Data.Text as T
import Data.List.NonEmpty (NonEmpty (..))
import Data.Void

annotateErrorBundle :: MP.ParseErrorBundle T.Text Void -> NonEmpty (MP.SourcePos, T.Text)
annotateErrorBundle bundle
  = fmap (\(err, pos) -> (pos, T.pack . MP.parseErrorTextPretty $ err)) . fst $
    MP.attachSourcePos MP.errorOffset
                       (MP.bundleErrors bundle)
                       (MP.bundlePosState bundle)

Note that unlike your proposed answer, attachSourcePos threads the PosState properly through the traversal of the error bundle, rather than throwing the updated state away after every reachOffset call. As a result, I believe it will be more efficient for a large number of errors. (It also uses reachOffsetNoLine instead of reachOffset which may be more efficient for certain stream types.

If you're using a megaparsec < 7.0.0, you might want to try to adapt the source for attachSourcePos from later versions.

CodePudding user response:

I was able to get this to work as follows:

import qualified Text.Megaparsec as MP
import Data.List.NonEmpty (NonEmpty (..))
import qualified Data.Text as T

annotateErrorBundle :: MP.ParseErrorBundle Text Void -> NonEmpty (MP.SourcePos, Text)
annotateErrorBundle bundle = (\e -> (errorSrcPos e, T.pack $ MP.parseErrorTextPretty e)) <$> MP.bundleErrors bundle
  where 
    initialPosState = MP.bundlePosState bundle
    errors = MP.bundleErrors bundle
    errorSrcPos e = MP.pstateSourcePos . snd $ MP.reachOffset (MP.errorOffset e) initialPosState 

I suspect that this probably isn't super efficient, because I'm calling reachOffset once per error. However, in practice, the list of errors probably isn't that large, so I'm not too worried.

  • Related