How to skip the SQL(part of the SQL) parsing in antlr4?-CodePudding

Sorry for this question was closed and can not be reopened, and my poor english, it was translated by website indeed. :) How to skip SQL parsing in antlr4?

@BartKiers Thanks for being interested in this question, let me give it a detailed example.

There are lots of SQLs, such as select * from user or update user set field1 = value1 where condition = value` etc, let's called it originalSQLs.

There is a java program which intercepted and parsed all the originalSQLs into ASTNode by antlr4, and then rewritten (which depended on the parse phase) by the java program, so the originalSQLs may be parsed and rewritten as select field1, field1_encrypted, field1_digest, field2 from user or update user set field1 = value1, field1_encrypted = encrypt_algorithm(value1), field1_digest = digest_algorithm(value1) where condition_digest = digest_algorithm(values) etc.

While they finished the rewritten phase, they should be executed as SQLStatement, the SELECT was executed as SelectSQLStatement while UPDATE executed as UpdateSQLStatement.

Now I thought some of the originalSQLs should be skipped the parse phase, and the rewrite phase which should be skipped as the same, but this originalSQLs should be executed as it was.

I thought to mark those with comment as /* PARSE_PHASE_SKIPPED=TRUE */ originalSQL or prefix SKIP as SKIP originalSQL, I wish to parse the whole marked but originalSQL part into ASTNode by antlr4, and executed it as ParsePhaseSkippedSQLStatement.

Can antlr4 support on this situation, and how should the grammar be written? Thanks in advance.

CodePudding user response：

If I understand your question correctly, you wish to use some sort of comment/annotation to "turn off" execution of the following SQL statement.

(NOTE: You can't really skip "parsing" part of the input. This will address ways in which you could skip processing part of the parsed input, which I believe is what you're ultimately wanting to accomplish.)

This would not be an ANTLR concern. ANTLR's responsibility is to parse you input stream and produce a parse tree (not technically an AST) that correctly represents the structure of your input.

Executing the SQL is not what ANTLR does. It does, however, generate utility Listener and Visitor classes that can be used to cleanly and efficiently navigate the resulting parse tree. There can be a lot of code involved in actually executing the SQL from the parse tree. Often, the first step is to produce an AST from the parse tree to make it easier to deal with.

You have a couple of alternatives (as you hint at).

1 - Using the current grammar an putting these annotations inside of comments (/* PARSE_PHASE_SKIPPED=TRUE */)

This can be done, but it's a bit "messy". It's most likely that COMMENT tokens are -> skiped (or perhaps sent to -> channel(HIDDEN)). This makes it MUCH easier to write the parser rules as you don't have to include optional COMMENTs everywhere a comment could appear. That said, if you send COMMENT tokens to the HIDDEN channel, they are still in the token stream even though they are ignored by the parser. The COMMENT tokens won't be in the rule Context objects that the listeners/visitors deal with, but you could look backwards/forwards in the token stream for COMMENT nodes.

2 - you could introduce some new syntax for annotations (similar to your SKIP idea). To do this you'll have to extend the syntax in the grammar to recognize these annotations. They'd have to be distinguishable from valid SQL, so a simple SKIP is probably not going to work.

The benefit of this approach is that, when you extend the grammar to recognize annotations, you can be very specific about where annotations are allowed. You'd be able to include them in your parse tree, making them easier to locate.

With either of these approaches, you would use a visitor or listener to go through your parse tree looking for the comment/annotation and then mark the subsequent statement with an indicator that you don't want to execute it. (You might use the information to simply skip the parse tree to AST transformation of the "skipped" nodes).

CodePudding user response：

Thank you for your reply @Mike Cargal, Yes, almost.

Let me say it again and give a more detailed example.

There is a java system that we call it X, X has lots of SQLs that the developers write and guarantee that those SQLs can be executed correctly by ibatis / jpa etc, let's named those SQLs as originalSQLs.

Using below originalSQLs as examples:

insert into user (username, id_no) values ('xyz', '123456')

select username, id_no from user u where u.id_no = '123456'

We say the column id_no on table user is sensitive data, we should save ciphertext instead of plaintext, so the originalSQLs would be parsed by ANTLR and rewritten by java code as belows, let's named those SQLs as rewrittenSQLs, also rewrittenSQLs should be executed correctly by ibatis / jpa etc.

insert into user (username, id_no, id_no_cipher, id_no_digest) values ('xyz', '', 'encrypted_123456', 'digest_123456')

select username, id_no_cipher as id_no from user u where u.id_no_digest = 'digest_123456'

In this case:

1、we see that the rewrite phase depends on the parse phase, originalSQLs need to be correctly parsed then to be rewritten by java code.

2、all originalSQLs are parsed but only a few matching the sensitive rules are rewritten to rewrittenSQLs.

But there are lots originalSQLs we clearly know that it does not need to be rewritten, and also no need to be parsed, and may report exceptions in various complex situations while parsing it, but it should be executed correctly by ibatis / jpa etc.

So I planed to use sql comment / customized keyword annotation to "turn off" parse phase of it.