Currently I have a parser:
pScientific :: Parser Scientific
pScientific = lexeme L.scientific
This is able to easily parse something like 4087.00
but fails when then number 4,087.00
Is there a way to make megaparsec parse number with comma?
PS: I am very new to haskell, so apologize if this is a stupid question
CodePudding user response:
If the rest of your parser does not have commas, a cheap and cheerful solution would be to simply delete them all before parsing.
If you do need to retain the commas during parsing, then your best bet is most likely to look up the source for scientific
, copy, paste, and tweak -- I don't know of a pre-made parser for this that accepts commas.
CodePudding user response:
The reason this is not parsed is because the scientific
type is mainly defined for JSON parsing, and JSON does not allow this, and a comma is used to separate elements in arrays and objects.
We can take a look at the implementation of scientific
[src]:
-- | Parse a JSON number. scientific :: Parser Scientific scientific = do sign <- A.peekWord8' let !positive = not (sign == W8_MINUS) when (sign == W8_PLUS || sign == W8_MINUS) $ void A.anyWord8 n <- decimal0 let f fracDigits = SP (B.foldl' step n fracDigits) (negate $ B.length fracDigits) step a w = a * 10 fromIntegral (w - W8_0) dotty <- A.peekWord8 SP c e <- case dotty of Just W8_DOT -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8) _ -> pure (SP n 0) let !signedCoeff | positive = c | otherwise = -c (A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *> fmap (Sci.scientific signedCoeff . (e )) (signed decimal)) <|> return (Sci.scientific signedCoeff e) {-# INLINE scientific #-}
The main thing to change is the decimal0
part, that captures a sequence of zero or more decimal numbers. We can for example implement this with:
import qualified Data.ByteString as B
decimal0' :: Parser Integer
decimal0' = do
digits <- B.filter (\x -> x /= 44) <$> A.takeWhile1 (\x -> isDigit_w8 x || x == 44)
if B.length digits > 1 && B.unsafeHead digits == 48
then fail "leading zero"
else return (bsToInteger digits)
and then use that one with:
import qualified Data.Attoparsec.ByteString as A
import qualified Data.Scientific as Sci
import Data.Attoparsec.ByteString.Char8 (isDigit_w8)
-- | Parse a JSON number.
scientific :: Parser Scientific
scientific = do
sign <- A.peekWord8'
let !positive = not (sign == 45)
when (sign == 43 || sign == 45) $
void A.anyWord8
n <- decimal0'
let f fracDigits = SP (B.foldl' step n fracDigits)
(negate $ B.length fracDigits)
step a w = a * 10 fromIntegral (w - W8_0)
dotty <- A.peekWord8
SP c e <- case dotty of
Just 46 -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8)
_ -> pure (SP n 0)
let !signedCoeff | positive = c
| otherwise = -c
(A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *>
fmap (Sci.scientific signedCoeff . (e )) (signed decimal)) <|>
return (Sci.scientific signedCoeff e)
{-# INLINE scientific' #-}
This does not take into account that the comma is placed after every three digits, so that will require extra logic, but this is a basic implementation to work accept commas in the integral part of the Scientific
.