What are the potential drawbacks of using the Facade design pattern in C?-CodePudding

I've been writing a compiler for some time now as a passion project. As it stands, the compiler is written fully in C . I employ the visitor pattern for things like type checking, code generation, debug printing, etc. I've realized over time, I don't need such a complex system to represent the various AST nodes. Whenever I want to add another AST node, it requires adding tons of boilerplate code and methods to "visit" this node.

I'm playing around with the idea of rewriting the front end of the compiler largely in C where I'll still use the C LLVM-IR API in the back end to simplify things. I think that OOP might add some unnecessary complexities to how I'm representing AST nodes, so I'm toying with the Facade pattern in C:

typedef enum {
  AST_NUM,
  AST_EXPR
} NodeType;

typedef struct AstNumNode AstNumNode;
typedef struct AstExprNode AstExprNode;

typedef struct AstNode {
  NodeType type;
  union {
    AstNumNode* num;
    AstExprNode* expr;
  } actual;
} AstNode;

struct AstExprNode {
  AstNode* left;
  AstNode* right;
};

struct AstNumNode {
  int value;
};

Of course, I'll still have to write methods such as:

void ast_node_print(AstNode* node, int depth); // switch on node->type and select one of the below methods based on that type
void ast_num_print(AstNumNode* node, int depth);
void ast_expr_print(AstExprNode* node, int depth);

which I view as much simpler than the visitor pattern.

However, when I'm creating these nodes, it can be a bit tedious to do the following:

AstNode* numNode = malloc(sizeof(AstNode));
numNode->type = AST_NUM;
numNode->actual.num = malloc(sizeof(AstNumNode));
numNode->actual.num->value = 10;

The last drawback I can see is space. Each AstNode will take up the same amount of space, regardless of what "sub-node" it wraps, because of the union. When compiling large programs, I can see this becoming a problem.

CodePudding user response：

The last drawback I can see is space. Each AstNode will take up the same amount of space, regardless of what "sub-node" it wraps, because of the union

In C, use a flexible array member so memory for the last member is sized as needed.

typedef struct AstNode {
  NodeType type;
  union {
    AstNumNode num;
    AstExprNode expr;
  } actual[];
} AstNode;

... and then allocate to the needed size

AstNode* numNode;
switch (type) {
  case AST_NUM: numNode = malloc(sizeof numNode[0]   sizeof numNode->actual[0].num); 
    numNode->type = type;
    numNode->actual.num.value = 10;
    break;
  case AST_EXPR: numNode = malloc(sizeof numNode[0]   sizeof numNode->actual[0].expr);
    // Assign values
    numNode->type = type;
    numNode->actual.expr.left = NULL;
    numNode->actual.expr.right = NULL;
    break;

This rolls 2 allocations done in old code into 1.
Yet without seeing the larger context, it is unclear if it is worth it.

True goal

OP seems to want to reduce complexity/code and object memory space. Often these are in conflict with earth other. Decide: choose a balance or choose one

CodePudding user response：

I already tried the suggested approach. In the end, the union became so big as the code evolved, it was hard to keep the code readable. Also, there was some boilerplate to initiate the correct members for each kind of node.

I also think that the visitor pattern is verbose. What I ended using was a base class named AstNode that has a kind (your NodeType type). Each class that extends AstNode set the appropriate kind. It is considered bad because I do a lot of pointer casts, but in the end was more sane for me than the visitor approach.

Although not using visitor pattern, I ended up coding a lot of visitor like methods, but without the boilerplate (same function worked for many nodes with some typecasting).

To have a kind in the AstNode also helped me to keep track of cases that I didn't implemented yet. I can do a switch case and put a default with a warning message to remind me of an implemented case.

I don't know where is your language right now. Mine is at the intermediate code generation step and how I access the Ast methods here helped me to drive the "final" struct. For instance, I wanted to get easy access for stuff like the type, symbol, size in bytes etc of an identifier, so I sprinkled the AstIdentifier with pointers for such values. If I used your union approach it would end up too big for my tracking abilities. So instead of passing symbol table context around or whatever I can just use id.get_type(), id.get_size(), id.get_address() and so on.

In the end, my suggestion is: go with what is easier for you. There is really no right and wrong way to do it. It took me some time to get into this approach and feel comfortable with it.

If you can, read the book "Language Implementation Patterns" by Terence Parr. If I remember correctly, chapter 4 discuss different approaches to build AST, the advantages and disadvantages of each one, including the one you're suggesting.