Home > Net >  how to distinguish tokens which have similar patterns in Lexer, but they occur in different contexts
how to distinguish tokens which have similar patterns in Lexer, but they occur in different contexts

Time:12-31

I have two pretty similar patterns in Lexer.x first for numbers second for byte. Here they are.

$digit=0-9
$byte=[a-f0-9]


    $digit                        { \s -> TNum  (readRational s) }
    $digit .$digit                { \s -> TNum  (readRational s) }
    $digit .$digit e$digit        { \s -> TNum  (readRational s) }
    $digit e$digit                { \s -> TNum  (readRational s) }
    $byte$byte                        { \s -> TByte (encodeUtf8(pack s))     }

I have Parser.y

%token

        cnst                            { TNum  $$}
        byte                            { TByte  $$}
        '['                            { TOSB     }    
        ']'                            { TCSB     }

%%

Expr: 
 '[' byte ']' {$1}
| const {$1}

when I write, I got.

[ 11 ] parse error
11 ok

but when I put byte pattern in Lexer before numbers

$digit=0-9
$byte=[a-f0-9]

    $byte$byte                        { \s -> TByte (encodeUtf8(pack s))     }
    $digit                        { \s -> TNum  (readRational s) }
    $digit .$digit                { \s -> TNum  (readRational s) }
    $digit .$digit e$digit        { \s -> TNum  (readRational s) }
    $digit e$digit                { \s -> TNum  (readRational s) }

I got

[ 11 ] ok
11 parse error

I think that happens because Lexer makes tokens from string and then gives them to parser. And when parser wait for byte token it got number token and parser don't have opportunity to make from this value another token. What I should do in this situation?

CodePudding user response:

In that case you should postpone parsing. You can for example make a TNumByte data constructor that stores the value as String:

Token
    = TByte ByteString
    | TNum Rational
    | TNumByte String
    -- …

For a sequence of $digits, it is not yet clear if we have to interpret this as byte or number, so we construct a TNumByte for this:

$digit=0-9
$byte=[a-f0-9]

$digit$digit                  { TNumByte }
$byte$byte                    { \s -> TByte (encodeUtf8(pack s)) }
$digit                        { \s -> TNum  (readRational s) }
$digit .$digit                { \s -> TNum  (readRational s) }
$digit .$digit e$digit        { \s -> TNum  (readRational s) }
$digit e$digit                { \s -> TNum  (readRational s) }

then in the parser we can decide based on the context:

%token

  cnst                           { TNum $$ }
  byte                           { TByte $$ }
  numbyte                        { TNumByte $$ }  --            
  • Related