I'm looking to have a regex to match all potentially multiline calls to a variadic c function. The end goal is to print the file, line number, and the fourth parameter of each call, but unfortunately I'm not there yet. So far, I have this:
perl -ne 'print if s/^.*?(func1\s*\(([^\)\(,] ||,|\((?2)\))*\)).*?$/$1/s' test.c
with test.c:
int main() {
func1( a, b, c, d);
func1( a, b,
c, d);
func1( func2(), b, c, d, e );
func1( func2(a), b, c, d, e );
return 1;
}
-- which does not match the second call. The reason it doesn't match is that the s
at the end of the expression allows .
to match newlines, but doesn't seem to allow [..]
constructs to match newlines. I'm not sure how to get past this.
I'm also not sure how to reference the fourth parameter in this... the $2
, $3
do not get populated in this (and even if they did I imagine I would get some issues due to the recursive nature of the regex).
CodePudding user response:
Not Perl but perhaps simpler:
$ cat >test2.c <<'EOD'
int main() {
func1( a, b, c, d1);
func1( a, b,
c, d2);
func1( func2(), "quotes\"),(", /*comments),(*/ g(b,
c), "d3", e );
func1( func2(a), b, c, d4(p,q,r), e );
func1( a, b, c, func2( func1(a,b,c,d5,e,f) ), g, h);
return 1;
}
EOD
$ cpp -D'func1(a,b,c,d,...)=SHOW(__FILE__,__LINE__,d,)' test2.c |
grep SHOW
SHOW("test2.c",2,d1);
SHOW("test2.c",3,d2)
SHOW("test2.c",5,"d3")
SHOW("test2.c",7,d4(p,q,r));
SHOW("test2.c",8,func2( SHOW("test2.c",8,d5) ));
$
As the final line shows, a bit more work is needed if the function can take itself as an argument.
CodePudding user response:
This should catch your functions, with caveats
perl -0777 -wnE'@f = /(func1\s*\( [^;]* \))\s*;/xg; s/\s / /g, say for @f' tt.c
I use the fact that a statement must be terminated by ;
. Then this excludes an accidental ;
in a comment and it excludes calls to this being nested inside another call. If that is possible then quite a bit more need be done to parse it.
However, further parsing the captured calls, presumably by commas, is complicated by the fact that a nested call may well, and realistically, contain commas. How about
func1( a, b, f2(a2, b2), c );
This becomes a far more interesting little parsing problem. Or, how about macros?
Can you clarify what kinds of things one doesn't have to account for?