Home > Mobile >  How to split by regex in shell script
How to split by regex in shell script

Time:12-24

I have the following output example:

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II

and I want to parse it via a this regex \[OK\](\s \w )\.(\w )\n([^\[] )

enter image description here

but when I am trying to create my shell script which looks like this:

#!/bin/bash

# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

# Create an empty list to hold the group lists
# Loop through the text and extract all matches
regex_pattern="\[OK\](\s \w )\.(\w )\n([^\[] )"
while [[ $text =~ $regex_pattern ]]; do
  # Create a list to hold the current groups
  echo "Matched_1: ${BASH_REMATCH[1]}"
  echo "Matched_2: ${BASH_REMATCH[2]}"
  echo "Matched_3: ${BASH_REMATCH[3]}"
  echo "-------------------"
done

Is not going to output anything...

CodePudding user response:

Bash does not do global matching.

But what you can do: if there's a match then remove the prefix ending in the matched text from the text string.

text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

re=$'\[OK\][[:space:]] ([[:alnum:]_] )\.([[:alnum:]_] )([^[]*)'
#                  no newline characters in the regex  ^^^^^^^

while [[ $text =~ $re ]]; do
    # output the match info
    declare -p BASH_REMATCH
    # and remove the matched text from the start of the string
    # (don't forget the quotes here!)
    text=${text#*"${BASH_REMATCH[0]}"}
done

outputs

declare -a BASH_REMATCH=([0]=$'[OK] AAA.BBBBBB\naaabbbcccdddfffed\nasdadadadadadsada\n' [1]="AAA" [2]="BBBBBB" [3]=$'\naaabbbcccdddfffed\nasdadadadadadsada\n')
declare -a BASH_REMATCH=([0]=$'[OK] CCC.KKKKKKK\nsome text here\n' [1]="CCC" [2]="KKKKKKK" [3]=$'\nsome text here\n')
declare -a BASH_REMATCH=([0]="[OK] OKO.II" [1]="OKO" [2]="II" [3]="")

Clearly, this destroys the $text variable, so make a copy if you need it after the loop.

The regex makes the solution a bit fragile: there cannot be any open brackets in the "following" lines.


Having said all that, this is not what bash is really good for. I'd use awk or perl for this task.

CodePudding user response:

Using PCRE with grep (as explained in comments, bash have no multiline mode):

#!/bin/bash

# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

grep -Pzo '(?m)\[OK\](\s \w )\.(\w )\n([^\[] )' <<< "$text"

Output

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here

Your regex is yours.

Check regex101 explanations about (?m)


Or with Perl (different output):

perl -0777 -ne 'print $& if m/\[OK\](\s \w )\.(\w )\n([^\[] )/' <<< "$text"

Output

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
  • Related