Home > Back-end >  Flex and Bison $variables giving unexpected values
Flex and Bison $variables giving unexpected values

Time:04-29

In my lexer file I set "yylval.str = yytext" for the token "name". Then in my bison file I try to read that str value to get the name as a string. However, when I read $2 I end up getting not only the token name, but also the rest of the line.

For example, a line could be "MOVE Z TO XY" where Z and XY are both names. In this instance, I would expect $2's value to be "Z" and $4's value to be "XY". But what actually happens is $2's value is "Z TO XY", and $4's value is "XY". I imagine $4 has the same problem, but has nothing else at the end of the line so it doesn't cause any issue.

Why is $2 giving the entire remainder of the line like this? How do I just get the variable name?



(Shortened) Lexer code:

"MOVE"                  {return (MOVE);}
"TO"                    {return (TO);}
([0-9])                 {yylval.num = atoi(yytext); return (INTEGER);}
[a-z][a-z0-9\-]*        {yylval.str = _strlwr(yytext); return (NAME);}

(Shortened) Parser code:

%token MOVE
%token TO
%token <num> INTEGER
%token <str> NAME
%union{
    int num;
    char *str;
}

move:
    MOVE NAME TO NAME PERIOD { printf("<Var1: %s>, <Var2: %s>", $2, $4);}
    | MOVE INTEGER TO NAME PERIOD { printf("<Val: %d>, <Var: %s>", $2, $4); }

Input:

MOVE Z TO XY-1
MOVE 15 TO XY-1

Output:

<Var1: z TO xy-1>, <Var2: xy-1>
<Val: 15>, <Var: xy-1>

CodePudding user response:

In my lexer file I set "yylval.str = yytext" for the token "name". Then in my bison file I try to read that str value to get the name as a string. However, when I read $2 I end up getting not only the token name, but also the rest of the line.

That's not at all surprising. yytext is a pointer (in)to the input buffer, starting at the position of the current match. By the time the parser looks at the pointed-to data, they typically are not a string, in the sense that there usually is not a string terminator immediately following the token characters (but see below).

Furthermore, it is possible that by the time the parser gets around to looking at the token's semantic value, the lexer has read new data into the input buffer, yanking the original token text right out from under you.

How do I just get the variable name?

To get the token text as a string that you can access later, and to make sure it doesn't get modified out from under you, you need to make a copy of it. That would probably need to be in dynamically-allocated memory. Inside the lexer action only, you can rely on Flex to have provided a temporary string terminator, so you may use strdup() (if you have it) to make such a copy. You appear to be using Microsoft's C library, which does have strdup.

Then:

[a-z][a-z0-9\-]*        {
                            char *temp = strdup(yytext);
                            if (temp == NULL) { /* handle allocation error*/}
                            else {
                                yylval.str = _strlwr(temp); return (NAME);
                            }
                        }

You will need to ensure that the dynamically allocated semantic values of your tokens are freed by the parser when they are no longer needed, before the pointers to them are lost.

  • Related