Split portion of string in bash-CodePudding

I have some very long text file ( > 300-500 MB) and with thousands of lines, like:

blavbl
[code]
sdasdasd
asdasd
...
[/code]

line X
line Y
etc
...

[code]
...
[/code]

blabla

[code]

[/code]

I want to split the text in pieces that contains string between [code] and [/code], I have the following code that does job (partially) but is very slow:

#!/bin/bash

function split {
        file="$1"
        start="$2"
        end="$3"

        nfodata=$(cat "$file")
        IFS=$'\n' read -d '' -a nfoarray <<< "$nfodata"

        arr=()
        start=0

        for line in "${nfoarray[@]}"
        do
                if [[ "$line" =~ ^"$start" ]]; then
                        arr =("$line")
                        start=1
                        continue
                fi

                if [[ "$line" =~ ^"$end" ]]; then
                        start=0
                        break
                fi

                if [[ $start == 1 ]]; then
                        arr =("$line")
                        continue
                fi
        done

        printf "%s\n" "${arr[@]}"
}

split $myfile "[code]" "[/code]"

As I wrote, is very slow, and don't know if is better or faster approach.

The final result want to be an array that contains portion of string between [code] and [/code]

CodePudding user response：

Using sed:

sed '/^\[code\]$/,/^\[\/code\]$/!d;//d'

Using awk:

awk  '
/^\[\/code\]$/ {--c} c
/^\[code\]$/ {  c}'

Either of these methods require the tag patterns to alternate cleanly - no nested, repeated or unclosed tags.

This prints all lines inside the tags, excluding the tags. Eg:

sdasdasd
asdasd
...
...
<empty line>