Home > Net >  grepping parameters from statement
grepping parameters from statement

Time:10-06

I have a statement that looks something like this:

DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END

I would like to extract the parameters "prog3", "prog1 arg1 arg2", "<=" (which could be any operator), "prog2 arg1" from the statement, using grep-style regex.

My command is: grep -E 'DO(.*)WHILE(.*)[<=](.*)END' <<< 'DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END' -o

The regex works on regex101.com, but not in grep, which simply returns the whole statement as a match, ie DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END

How can I fix this?

CodePudding user response:

grep doesn't output all capture groups. It would be better to use sed like this:

s='DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END'
sed -E 's/DO (.*) WHILE (.*) ([<>=] ) (.*) END/\1\n\2\n\3\n\4/' <<< "$s"

prog3
prog1 arg1 arg2
<=
prog2 arg1

Here:

  • Used [<>=] in a separate capture group to grab operator text
  • Used spaces around capture groups to handle greediness of .*
  • Use \n after each back reference to print each group on a separate line like grep -o

CodePudding user response:

With perl one-liner you could try following code. Written and tested with your shown samples Only. Here is the Online Demo for used regex in Perl program.

perl -pe 's/\bDO (.*?) WHILE ([^=><]*) ([=><] ) (.*?)\bEND\b/$1\n$2\n$3\n$4/' Input_file

Output will be as follows:

prog3
prog1 arg1 arg2
<=
prog2 arg1

Explanation: Adding detailed explanation for above used regex.

\bDO         ##Matching word boundary followed by DO here.
(.*?)        ##Creating 1st capturing group putting a lazy match to get values just before next mentioned value.
 WHILE       ##Matching space followed by WHILE.
 ([^=><]*)   ##Matching space and creating 2nd capturing group which matches anything apart from = < >
 ([=><] )    ##Matching space and creating 3rd capturing group where matching 1 or more occurrences of > < OR =
 (.*?)       ##Matching space and creating 4th capturing group with a Lazy match in it.
\bEND\b      ##Matching word boundary followed by END followed by another word boundary here.
  • Related