I am trying to implement a parser in Java to extract the arguments of some functions.
When I have a function like:
max(1, 2, 3)
I just simply can use a Regular Expresion to extract the args.
But all my functions are not like that. If I have some nested function, eg:
max(sum(1, max(1,2,sum(2,5)), 3, 5, mult(3,3))
I would like to obtain:
sum(1, max(1,2,sum(2,5))
3
5
mult(3,3)
I tried using a Regular Expression, but I asume the language is not regular. Another approach was splliting by ','
, but did not work as well.
Is there any method for extracting the arguments of a function? I do not really know how this type of problem can be solved since there is no a pattern to use for extracting the arguments.
Any help or insight would be really appreciated. Thanks!!
CodePudding user response:
Parsing a source code into an some abstract model is quite complex topic, depending on the language complexity. But first step is usually tokenization, where you read one character at a time and detect full tokens (like variable names, function names, operators, literals etc). Since you presented only very limited scope for the problem , you have very small set of tokens:
- name of a function
(
and)
to indicate method call,
to separate arguments- numbers
Reading one symbol at the time, you should be able to very easily detect when one token ends and the next one begins. Also your tokens are very distinct (i.e. you don't have to differentiate function name from variable name), you can very easily categorize them. Once you have a token, you know the grammar (you have only function calls), you can easily build a syntax tree (where at the root you have top level function call with its arguments being children nodes). From that structure you can easily fetch whichever parts you wish.
If you are more interested in how it works in the javac
compiler, you can always check out its source code (it's open source after all):
- https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java
- https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavaTokenizer.java
However, that may be quite a long read.
CodePudding user response:
Finally found a method that works:
public List<String> parseArgs(String l){
int startIdx = l.indexOf("(") 1;
int endIdx = l.lastIndexOf(")") - 1;
int count = 0;
int argIdx = startIdx;
List<String> args = new ArrayList<>();
for (int i = startIdx; i < endIdx; i ) {
if (l.charAt(i) == '(')
count -= 1;
else if (l.charAt(i) == ')'){
count = 1;
}
else if (l.charAt(i) == ',' && count == 0){
args.add(l.substring(argIdx, i).trim());
argIdx = i 1;
}
}
args.add(l.substring(argIdx, endIdx 1).trim());
return args;
}
String l = "max(sum(1, max(1,2,sum(2,5))), 3, 5, mult(3,3))";
parseArgs(l).forEach(System.out::println);
//Prints
sum(1, max(1,2,sum(2,5)))
3
5
mult(3,3)