Home > Enterprise >  How to parse Number with comma via Megaparsec
How to parse Number with comma via Megaparsec

Time:12-06

Currently I have a parser:

pScientific :: Parser Scientific
pScientific = lexeme L.scientific

This is able to easily parse something like 4087.00

but fails when then number 4,087.00 Is there a way to make megaparsec parse number with comma?

PS: I am very new to haskell, so apologize if this is a stupid question

CodePudding user response:

If the rest of your parser does not have commas, a cheap and cheerful solution would be to simply delete them all before parsing.

If you do need to retain the commas during parsing, then your best bet is most likely to look up the source for scientific, copy, paste, and tweak -- I don't know of a pre-made parser for this that accepts commas.

CodePudding user response:

The reason this is not parsed is because the scientific type is mainly defined for JSON parsing, and JSON does not allow this, and a comma is used to separate elements in arrays and objects.

We can take a look at the implementation of scientific [src]:

-- | Parse a JSON number.
scientific :: Parser Scientific
scientific = do
  sign <- A.peekWord8'
  let !positive = not (sign == W8_MINUS)
  when (sign == W8_PLUS || sign == W8_MINUS) $
    void A.anyWord8

  n <- decimal0

  let f fracDigits = SP (B.foldl' step n fracDigits)
                        (negate $ B.length fracDigits)
      step a w = a * 10   fromIntegral (w - W8_0)

  dotty <- A.peekWord8
  SP c e <- case dotty of
              Just W8_DOT -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8)
              _           -> pure (SP n 0)

  let !signedCoeff | positive  =  c
                   | otherwise = -c

  (A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *>
      fmap (Sci.scientific signedCoeff . (e  )) (signed decimal)) <|>
    return (Sci.scientific signedCoeff    e)
{-# INLINE scientific #-}

The main thing to change is the decimal0 part, that captures a sequence of zero or more decimal numbers. We can for example implement this with:

import qualified Data.ByteString as B

decimal0' :: Parser Integer
decimal0' = do
  digits <- B.filter (\x -> x /= 44) <$> A.takeWhile1 (\x -> isDigit_w8 x || x == 44)
  if B.length digits > 1 && B.unsafeHead digits == 48
    then fail "leading zero"
    else return (bsToInteger digits)

and then use that one with:

import qualified Data.Attoparsec.ByteString as A
import qualified Data.Scientific as Sci
import Data.Attoparsec.ByteString.Char8 (isDigit_w8)

-- | Parse a JSON number.
scientific :: Parser Scientific
scientific = do
  sign <- A.peekWord8'
  let !positive = not (sign == 45)
  when (sign == 43 || sign == 45) $
    void A.anyWord8

  n <- decimal0'

  let f fracDigits = SP (B.foldl' step n fracDigits)
                        (negate $ B.length fracDigits)
      step a w = a * 10   fromIntegral (w - W8_0)

  dotty <- A.peekWord8
  SP c e <- case dotty of
              Just 46 -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8)
              _           -> pure (SP n 0)

  let !signedCoeff | positive  =  c
                   | otherwise = -c

  (A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *>
      fmap (Sci.scientific signedCoeff . (e  )) (signed decimal)) <|>
    return (Sci.scientific signedCoeff    e)
{-# INLINE scientific' #-}

This does not take into account that the comma is placed after every three digits, so that will require extra logic, but this is a basic implementation to work accept commas in the integral part of the Scientific.

  • Related