Linux bash: How do I replace a string on a line based on a pattern on another/different line?-CodePudding

I have a file that contains the following data:

GS*PO*112233*445566*20211006*155007*2010408*X*004010~

ST*850*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~

ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~

ST*850*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~

For clarity, I have inserted blank line above each ST*850 line. Here is what I want to do:

Search for the pattern REF*ZZ*SO
If found, then replace the preceding ST*850 line with ST*850C

So the resultant file would look like this:

GS*PO*112233*445566*20211006*155007*2010408*X*004010~

ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~

ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~

ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~

Here is what I have tried:

sed -i -n '/^REF\*ZZ\*SO/!{x;s/ST\*850\*/ST\*850C\*/;x};x;1!p;${x;p}' file

This replaces all the three ST*850 lines with ST*850C and not just the 1st and the 3rd. What am I doing wrong?

CodePudding user response：

How about a perl solution although perl is not included in the tags.

perl -0777 -aF'(?=ST\*850)' -ne '
    print map {/REF\*ZZ\*SO/ && s/ST\*850/$&C/; $_} @F;
' file

Output:

GS*PO*112233*445566*20211006*155007*2010408*X*004010~

ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~

ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~

ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~

The -0777 option tells perl to slurp whole file at once.
The -a option enables the auto split mode then the split fragments are stored in the array @F.
The -F option specifies the pattern to split the input.
The regex (?=ST\*850) is a positive lookbehind which matches at the beginning of a string ST*850.
The -ne option is mostly equivalent to that of sed.
The map {..} @F function converts all elements of @F according to the statement within the curly brackets.
The statement /REF\*ZZ\*SO/ && s/ST\*850/$&C/ is translated as: "if the element of @F matches the pattern /REF*ZZ*SO/, then perform the substitution s/ST*850/$&C/ for the element."
The final $_ is the perl's default variable similar to the pattern space of sed and will be the return values of the map function.

CodePudding user response：

This might work for you (GNU sed):

sed '/ST\*850/{:a;/REF\*ZZ\*SO/!{N;ba};s/.*ST\*850/&C/}' file

Begin gathering up lines if a line contains ST*850.

On matching a line that contains REF*ZZ*SO use greed to append C to the latest ST*850 string.

N.B. The regexp .* ensures that the match will backtrack from the end of the collection rather than the start of the collection.

CodePudding user response：

Preprocess with sed to insert new-lines then treat each block as an awk record, e.g.:

sed 's/^ST\*850/\n&/' | awk '/REF\*ZZ\*SO/ { sub(/ST\*850/, "&C") } 1' RS=

CodePudding user response：

Assuming ST is essentially a record separator, you can use a simple Awk script to collect the lines in the current record, and print a modified different one if the conditions are right.

awk 'BEGIN { ORS = RS = "\nST" }
    /REF\*ZZ\*SO/ { sub(/^\*850/, "*<850C") }1' filename

The BEGIN clause sets the record separator (RS) and also the output record separator (ORS) to the string ST preceded by a newline. (Attempting to include the asterisk got complicated, so I avoided that.) The final 1 is the common Awk shorthand for "print everything which reaches here".

sed is rather unwieldy for anything beyond simple line-based substitutions; I think you will find that switching to a higher-level language is going to improve maintainability.

CodePudding user response：

Pure Bash: much more verbose but hopefully does not require any additional explanation.

#! /bin/bash

init_chunk()
{
  prefix=$1
  suffix=$2
  chunk=()
  refzzso=
}

print_chunk()
{
  if [[ ${#chunk[@]} > 0 ]]; then
    if [[ $refzzso == true ]]; then
      printf '%sC%s\n' "$prefix" "$suffix"
    else
      printf '%s%s\n' "$prefix" "$suffix"
    fi
    printf '%s\n' "${chunk[@]}"
  fi
}

init_chunk
while read -r line; do
  # Check for header.
  if [[ $line =~ ^(ST\*850)(.*) ]]; then
    # Print previous chunk.
    print_chunk
    # Begin new chunk.
    init_chunk "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
    continue
  fi
  # Check if in a chunk.
  if [[ $prefix ]]; then
    # Check for modifier.
    if [[ $line =~ ^REF\*ZZ\*SO ]]; then
      refzzso=true
    fi
    chunk =("$line")
  else
    printf '%s\n' "$line"
  fi
done
# Print last chunk.
print_chunk

CodePudding user response：

The reason why your solution substitutes all occurrences is that you do not append the lines, you only swap back and forth between pattern and hold spaces. What you need is a kind of buffering until one or the other of the special lines are encountered. This is typically done by appending the pattern space to the hold space until a condition is fulfilled.

With sed (tested with GNU sed):

sed -n '/^ST\*850\*/{x;1!p;b};
        /^REF\*ZZ\*SO/{1!{H;x};s/ST\*850\*/ST*850C*/;p;b};
        1{h;b};H;${x;p}' file

If it is a ST*850* line, swap pattern and hold spaces. Then, if it is not the first line, print. Start a new cycle. The hold space contains the ST*850* line. The preceding lines that were stored in the hold space, if any, have been printed.
Else, if it is a REF*ZZ*SO line, swap pattern and hold spaces and do the substitution. Then, if it is not the first line, print. Start a new cycle. The hold space contains the REF*ZZ*SO line. The preceding lines that were stored in the hold space, if any, have been printed (after modification).
Else, if it is the first line, replace the hold space by the pattern space and start a new cycle. The hold space thus contains the first line.
Else append the pattern space to the hold space. If it is the last line swap pattern and hold spaces and print.