Home > Back-end >  Automatically grep an expression between matched parenthesis
Automatically grep an expression between matched parenthesis

Time:09-06

I have a source code which frequently includes a piece of code like

foo
(
    bar
    (
        foo0(<An arbitrary number of parenthesis may appear here>)
    ),
    foo1bar(<An arbitrary number of parenthesis may appear here>)
)

I want to capture this piece; the way that I am going for is

grep -A15 -E "foo[[:space:]]*$" <file_name>

to make sure that enough lines after foo are captured.

However, a more accurate way is looking for a pattern which counts opened/closed parenthesis after foo in order to stop searching right after the matching closed parenthesis of foo is found.

Is it possible to avoid scripting this algorithm by using grep options?

Example
My file is

...

foo
(
    bar
    (
        a(b)
    ),
    c(d)
)
...
dummy
(
    nextDummy()
)
...

where ... represents lines of code which does not contain any ( or ) character.The expected output of grep is

foo
(
    bar
    (
        a(b)
    ),
    c(d)
)
dummy
(
    nextDummy()
)

CodePudding user response:

Using any awk in any shell on every Unix box to print all the functions to stdout:

$ awk '/^\(/{$0=prev ORS $0; f=1} f; /^)/{f=0} {prev=$0}' file
foo
(
    bar
    (
        a(b)
    ),
    c(d)
)
dummy
(
    nextDummy()
)

or to print every function to it's own file:

$ awk '/^\(/{close(out); out=prev; $0=prev ORS $0; f=1} f{print > out} /^)/{f=0} {prev=$0}' file

$ head -100 foo dummy
==> foo <==
foo
(
    bar
    (
        a(b)
    ),
    c(d)
)

==> dummy <==
dummy
(
    nextDummy()
)

or if you have a specific function you want to print:

$ awk -v tgt='foo' '/^\(/ && (prev==tgt){$0=prev ORS $0; f=1} f; /^)/{f=0} {prev=$0}' file
foo
(
    bar
    (
        a(b)
    ),
    c(d)
)

$ awk -v tgt='dummy' '/^\(/ && (prev==tgt){$0=prev ORS $0; f=1} f; /^)/{f=0} {prev=$0}' file
dummy
(
    nextDummy()
)

In the above we're assuming that a function body starts with ( on a line of it's own and ends with ) on a line of it's own and the function name is the line immediately preceding the start of the body.

Assuming whatever language your source code is written in supports strings and/or comments, it's impossible to do what you want just by counting parentheses as those could appear inside strings and comments.

You can't do this job 100% robustly without writing a parser for whatever language your source code is written, the best we can do with pattern matching against your source code is help you write a script that'll work with the subset of the language you provide as sample input/output.

CodePudding user response:

If your grep supports -P (PCRE) option, would you please try:

grep -zoP "[A-Za-z_]\w*\s*(\((?:[^()] |(?1))*\))" file

Output with the provided file:

foo
(
    bar
    (
        a(b)
    ),
    c(d)
)
dummy
(
    nextDummy()
)
  • [A-Za-z_]\w*\s* matches the names such as foo or dummy followed by posible space characters.
  • (\((?:[^()] |(?1))*\)) matches a substring enclosed by parantheses including the sequence of either of:
    • [^()] : any characters other than parentheses
    • (?1): recursion of the pattern enclosed by the outermost parentheses
  • Related