I have a statement that looks something like this:
DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END
I would like to extract the parameters "prog3", "prog1 arg1 arg2", "<=" (which could be any operator), "prog2 arg1" from the statement, using grep-style regex.
My command is: grep -E 'DO(.*)WHILE(.*)[<=](.*)END' <<< 'DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END' -o
The regex works on regex101.com, but not in grep, which simply returns the whole statement as a match, ie DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END
How can I fix this?
CodePudding user response:
grep
doesn't output all capture groups. It would be better to use sed
like this:
s='DO prog3 WHILE prog1 arg1 arg2 <= prog2 arg1 END'
sed -E 's/DO (.*) WHILE (.*) ([<>=] ) (.*) END/\1\n\2\n\3\n\4/' <<< "$s"
prog3
prog1 arg1 arg2
<=
prog2 arg1
Here:
- Used
[<>=]
in a separate capture group to grab operator text - Used spaces around capture groups to handle greediness of
.*
- Use
\n
after each back reference to print each group on a separate line likegrep -o
CodePudding user response:
With perl
one-liner you could try following code. Written and tested with your shown samples Only. Here is the Online Demo for used regex in Perl program.
perl -pe 's/\bDO (.*?) WHILE ([^=><]*) ([=><] ) (.*?)\bEND\b/$1\n$2\n$3\n$4/' Input_file
Output will be as follows:
prog3
prog1 arg1 arg2
<=
prog2 arg1
Explanation: Adding detailed explanation for above used regex.
\bDO ##Matching word boundary followed by DO here.
(.*?) ##Creating 1st capturing group putting a lazy match to get values just before next mentioned value.
WHILE ##Matching space followed by WHILE.
([^=><]*) ##Matching space and creating 2nd capturing group which matches anything apart from = < >
([=><] ) ##Matching space and creating 3rd capturing group where matching 1 or more occurrences of > < OR =
(.*?) ##Matching space and creating 4th capturing group with a Lazy match in it.
\bEND\b ##Matching word boundary followed by END followed by another word boundary here.