I have the following output example:
[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II
and I want to parse it via a this regex \[OK\](\s \w )\.(\w )\n([^\[] )
but when I am trying to create my shell script which looks like this:
#!/bin/bash
# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"
# Create an empty list to hold the group lists
# Loop through the text and extract all matches
regex_pattern="\[OK\](\s \w )\.(\w )\n([^\[] )"
while [[ $text =~ $regex_pattern ]]; do
# Create a list to hold the current groups
echo "Matched_1: ${BASH_REMATCH[1]}"
echo "Matched_2: ${BASH_REMATCH[2]}"
echo "Matched_3: ${BASH_REMATCH[3]}"
echo "-------------------"
done
Is not going to output anything...
CodePudding user response:
Bash does not do global matching.
But what you can do: if there's a match then remove the prefix ending in the matched text from the text string.
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"
re=$'\[OK\][[:space:]] ([[:alnum:]_] )\.([[:alnum:]_] )([^[]*)'
# no newline characters in the regex ^^^^^^^
while [[ $text =~ $re ]]; do
# output the match info
declare -p BASH_REMATCH
# and remove the matched text from the start of the string
# (don't forget the quotes here!)
text=${text#*"${BASH_REMATCH[0]}"}
done
outputs
declare -a BASH_REMATCH=([0]=$'[OK] AAA.BBBBBB\naaabbbcccdddfffed\nasdadadadadadsada\n' [1]="AAA" [2]="BBBBBB" [3]=$'\naaabbbcccdddfffed\nasdadadadadadsada\n')
declare -a BASH_REMATCH=([0]=$'[OK] CCC.KKKKKKK\nsome text here\n' [1]="CCC" [2]="KKKKKKK" [3]=$'\nsome text here\n')
declare -a BASH_REMATCH=([0]="[OK] OKO.II" [1]="OKO" [2]="II" [3]="")
Clearly, this destroys the $text
variable, so make a copy if you need it after the loop.
The regex makes the solution a bit fragile: there cannot be any open brackets in the "following" lines.
Having said all that, this is not what bash is really good for. I'd use awk or perl for this task.
CodePudding user response:
Using PCRE
with grep
(as explained in comments, bash
have no multiline mode):
#!/bin/bash
# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"
grep -Pzo '(?m)\[OK\](\s \w )\.(\w )\n([^\[] )' <<< "$text"
Output
[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
Your regex is yours.
Check regex101 explanations about (?m)
Or with Perl
(different output):
perl -0777 -ne 'print $& if m/\[OK\](\s \w )\.(\w )\n([^\[] )/' <<< "$text"
Output
[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada