I have a file that contains the following data:
GS*PO*112233*445566*20211006*155007*2010408*X*004010~
ST*850*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~
ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~
ST*850*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
For clarity, I have inserted blank line above each ST*850
line. Here is what I want to do:
- Search for the pattern
REF*ZZ*SO
- If found, then replace the preceding
ST*850
line withST*850C
So the resultant file would look like this:
GS*PO*112233*445566*20211006*155007*2010408*X*004010~
ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~
ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~
ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
Here is what I have tried:
sed -i -n '/^REF\*ZZ\*SO/!{x;s/ST\*850\*/ST\*850C\*/;x};x;1!p;${x;p}' file
This replaces all the three ST*850
lines with ST*850C
and not just the 1st and the 3rd. What am I doing wrong?
CodePudding user response:
How about a perl
solution although perl is not included in the tags.
perl -0777 -aF'(?=ST\*850)' -ne '
print map {/REF\*ZZ\*SO/ && s/ST\*850/$&C/; $_} @F;
' file
Output:
GS*PO*112233*445566*20211006*155007*2010408*X*004010~
ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~
ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~
ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
- The
-0777
option tellsperl
to slurp whole file at once. - The
-a
option enables theauto split
mode then the split fragments are stored in the array@F
. - The
-F
option specifies the pattern to split the input. - The regex
(?=ST\*850)
is a positive lookbehind which matches at the beginning of a stringST*850
. - The
-ne
option is mostly equivalent to that ofsed
. - The
map {..} @F
function converts all elements of@F
according to the statement within the curly brackets. - The statement
/REF\*ZZ\*SO/ && s/ST\*850/$&C/
is translated as: "if the element of @F matches the pattern /REF*ZZ*SO/, then perform the substitution s/ST*850/$&C/ for the element." - The final
$_
is the perl's default variable similar to thepattern space
of sed and will be the return values of the map function.
CodePudding user response:
This might work for you (GNU sed):
sed '/ST\*850/{:a;/REF\*ZZ\*SO/!{N;ba};s/.*ST\*850/&C/}' file
Begin gathering up lines if a line contains ST*850
.
On matching a line that contains REF*ZZ*SO
use greed to append C
to the latest ST*850
string.
N.B. The regexp .*
ensures that the match will backtrack from the end of the collection rather than the start of the collection.
CodePudding user response:
Preprocess with sed
to insert new-lines then treat each block as an awk
record, e.g.:
sed 's/^ST\*850/\n&/' | awk '/REF\*ZZ\*SO/ { sub(/ST\*850/, "&C") } 1' RS=
CodePudding user response:
Assuming ST
is essentially a record separator, you can use a simple Awk script to collect the lines in the current record, and print a modified different one if the conditions are right.
awk 'BEGIN { ORS = RS = "\nST" }
/REF\*ZZ\*SO/ { sub(/^\*850/, "*<850C") }1' filename
The BEGIN
clause sets the record separator (RS
) and also the output record separator (ORS
) to the string ST
preceded by a newline. (Attempting to include the asterisk got complicated, so I avoided that.) The final 1
is the common Awk shorthand for "print everything which reaches here".
sed
is rather unwieldy for anything beyond simple line-based substitutions; I think you will find that switching to a higher-level language is going to improve maintainability.
CodePudding user response:
Pure Bash: much more verbose but hopefully does not require any additional explanation.
#! /bin/bash
init_chunk()
{
prefix=$1
suffix=$2
chunk=()
refzzso=
}
print_chunk()
{
if [[ ${#chunk[@]} > 0 ]]; then
if [[ $refzzso == true ]]; then
printf '%sC%s\n' "$prefix" "$suffix"
else
printf '%s%s\n' "$prefix" "$suffix"
fi
printf '%s\n' "${chunk[@]}"
fi
}
init_chunk
while read -r line; do
# Check for header.
if [[ $line =~ ^(ST\*850)(.*) ]]; then
# Print previous chunk.
print_chunk
# Begin new chunk.
init_chunk "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
continue
fi
# Check if in a chunk.
if [[ $prefix ]]; then
# Check for modifier.
if [[ $line =~ ^REF\*ZZ\*SO ]]; then
refzzso=true
fi
chunk =("$line")
else
printf '%s\n' "$line"
fi
done
# Print last chunk.
print_chunk
CodePudding user response:
The reason why your solution substitutes all occurrences is that you do not append the lines, you only swap back and forth between pattern and hold spaces. What you need is a kind of buffering until one or the other of the special lines are encountered. This is typically done by appending the pattern space to the hold space until a condition is fulfilled.
With sed
(tested with GNU sed
):
sed -n '/^ST\*850\*/{x;1!p;b};
/^REF\*ZZ\*SO/{1!{H;x};s/ST\*850\*/ST*850C*/;p;b};
1{h;b};H;${x;p}' file
- If it is a
ST*850*
line, swap pattern and hold spaces. Then, if it is not the first line, print. Start a new cycle. The hold space contains theST*850*
line. The preceding lines that were stored in the hold space, if any, have been printed. - Else, if it is a
REF*ZZ*SO
line, swap pattern and hold spaces and do the substitution. Then, if it is not the first line, print. Start a new cycle. The hold space contains theREF*ZZ*SO
line. The preceding lines that were stored in the hold space, if any, have been printed (after modification). - Else, if it is the first line, replace the hold space by the pattern space and start a new cycle. The hold space thus contains the first line.
- Else append the pattern space to the hold space. If it is the last line swap pattern and hold spaces and print.