I'm building a parser for a simple interpreter in JavaScript. I have a preprocessor method that removes specific tokens from an input list of tokens produced by the tokenizer:
const tokenizer = new Tokenizer();
let program = [];
let symbols = {};
tokenizer.init(mainFile, null);
let tokens = tokenizer.exec(mainFile);
for (let i = 0; i < tokens.length; i ) {
const lookahead = [
tokens[i 1],
tokens[i 2]
];
if (tokens[i].type == 'PRE_DEFINE') {
if (lookahead[0].type == 'IDENTIFIER') {
symbols[tokens[i 1].value] = String(lookahead[1].value);
i =3;
} else {
throw new PreprocessorError(`Unexpected token "${lookahead[0].value}"`);
}
}
if (tokens[i].type == 'IDENTIFIER' && String(tokens[i].value) in symbols) {
program.push(symbols[tokens[i].value]);
} else {
program.push(tokens[i].value);
}
}
return program;
Each token is an object like this, with both type and value being strings:
// Affected constructor identifier token
{
type: 'IDENTIFIER',
value: 'constructor'
},
// Number token
{
type: 'NUMBERLITERAL',
value: '10'
}
Somewhere in this function, when tokens[i].value == 'constructor'
, said value is being converted to the actual JS keyword constructor (I would assume) and is showing up in debug as [Function: Object]. The word 'constructor' appearing in a token has caused no issues elsewhere in the code where it's handled and appears normal when console.log
'd directly before this loop, so I'm quite confused. Could someone point me in the direction of an explanation here?
I have added several calls to String() to attempt to force 'constructor'
to remain a string, but nothing seems to work.
I would assume that I've missed something in my code, but is it possible this is a JS issue?
Thanks!
CodePudding user response:
The problem here is that symbols
is a plain object, so it has a constructor
property which exists on the prototype. Reading that property returns a constructor, which is a function:
console.log("constructor" in {});
console.log(typeof ({})["constructor"]);
For a lookup, an object which does not have a prototype can be used. These are created with Object.create(null)
:
let program = [];
let symbols = Object.create(null);
let tokens = [
{
type: 'IDENTIFIER',
value: 'constructor'
},
{
type: 'NUMBERLITERAL',
value: '10'
}
]
for (let i = 0; i < tokens.length; i ) {
const lookahead = [
tokens[i 1],
tokens[i 2]
];
if (tokens[i].type == 'PRE_DEFINE') {
if (lookahead[0].type == 'IDENTIFIER') {
symbols[tokens[i 1].value] = String(lookahead[1].value);
i =3;
} else {
throw new PreprocessorError(`Unexpected token "${lookahead[0].value}"`);
}
}
if (tokens[i].type == 'IDENTIFIER' && tokens[i].value in symbols) {
program.push(symbols[tokens[i].value]);
} else {
program.push(tokens[i].value);
}
}
console.log(program);
It might be worth considering using a Map instead