Some background why I ask this question.
I'm reading Writing An Interpreter In Go, in the book, Token
struct is inside of AST Nodes. Node is a type that can be fulfilled by implementing tokenLiteral()
and String()
type IntegerLiteral struct {
Token token.Token
Value int64
}
type Node interface {
TokenLiteral() string
String() string
}
I understood that in real life, a compiler must provide the row and column location of errors, and the lexer can't detect errors so this information must be passed to the parser. For example go
compiler uses below as AST node.
type Pos int
// All node types implement the Node interface.
type Node interface {
Pos() token.Pos // position of first character belonging to the node
End() token.Pos // position of first character immediately after the node
}
Long version of my question
AFAIK, the Compilation frontend works like this: stream of chars -> streams of tokens -> AST
. In each level "some things" are abstracted. In my eyes, a Token
should not be part of AST Node
- Should a token be part of an
AST Node
- Could you give examples of what PLs choose which way
CodePudding user response:
The exact nature of the AST is an implementation detail of the compiler (or parsing library) and different AST implementations will have different fields, even different AST implementations for the same language.
It is almost always the case that there will be some mechanism for extracting source location information from an AST node, both for error messages and for debugging information embedded in the compiled output. That could be done by adding a location object (or objects) to every AST node type. Alternatively, the location information could be held in token objects somehow discoverable from the AST node. Or a mix of these strategies and provide a Location getter method.
I can't think of a good reason to either insist on or prohibit token objects from an AST. An AST node referring to a single-token literal or identifier might well be held in a Token object. Why not?