Home > Net >  Trying to get generic sed multi-line pattern match and substitution script to work
Trying to get generic sed multi-line pattern match and substitution script to work

Time:02-04

There is a generic approach to solving the problem presented by this poster which is presented on here, in section 4.23.3 . It appears to offer a method for handling any complex content pattern for matching target, then replacing that with, again, any other complex content pattern. The technique is referred to as the "sliding-window" technique.

I believe the below script faithfully recreates the scenario described and attempts to incorporate the sed script to demonstrate that approach as workable.

#!/bin/bash

DBG=1

###
### Code segment to be replaced
###
file1="File1.cpp"
rm -f "${file1}"
cat >"${file1}" <<"EnDoFiNpUt"
void Component::initialize()
{
    my_component = new ComponentClass();
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 1"

###
### Code segment to be used as replacement
###
file2="File2.cpp"
rm -f "${file2}"
cat >"${file2}" <<"EnDoFiNpUt"
void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 2"

###
### Create demo input file
###
testfile="Test_INPUT.cpp"
rm -f "${testfile}"
{
    echo "
other code1()
{
    doing other things
    doing more things
    doing extra things
} 
"
    cat "${file1}"

echo "
other code2()
{
    creating other things
    creating more things
    creating extra things
} 
"
} >>"${testfile}"

test ${DBG} -eq 1 && echo "fence 3"

###
### Create editing specification file
###
{
    cat "${file1}"
    echo "###REPLACE_BY###"
    cat "${file2}"
} >findrep.txt

test ${DBG} -eq 1 && echo "fence 4"


###
### sed script to create editing instructions to apply aove editing specification file
###
cat >"blockrep.sed" <<"EnDoFiNpUt"
#SOURCE:    https://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq4_013.html
#
# filename: blockrep.sed
#   author: Paolo Bonzini
# Requires:
#    (1) blocks to find and replace, e.g., findrep.txt
#    (2) an input file to be changed, input.file
#
# blockrep.sed creates a second sed script, custom.sed,
# to find the lines above the row of 4 hyphens, globally
# replacing them with the lower block of text. GNU sed
# is recommended but not required for this script.
#
# Loop on the first part, accumulating the `from' text
# into the hold space.
:a
/^###REPLACE_BY###$/! {
   # Escape slashes, backslashes, the final newline and
   # regular expression metacharacters.
   s,[/\[.*],\\&,g
   s/$/\\/
   H
   #
   # Append N cmds needed to maintain the sliding window.
   x
   1 s,^.,s/,
   1! s/^/N\
/
   x
   n
   ba
}
#
# Change the final backslash to a slash to separate the
# two sides of the s command.
x
s,\\$,/,
x
#
# Until EOF, gather the substitution into hold space.
:b
n
s,[/\],\\&,g
$! s/$/\\/
H
$! bb
#
# Start the RHS of the s command without a leading
# newline, add the P/D pair for the sliding window, and
# print the script.
g
s,/\n,/,
s,$,/\
P\
D,p
#---end of script---
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 5"


sed --debug -nf blockrep.sed findrep.txt >custom.sed
test ${DBG} -eq 1 && echo "fence 6"

if [ -s custom.sed ]
then
    more custom.sed
    echo -e "\t Hit return to continue ..." >&2
    read k <&2
else
    echo -e "\t Failed to create 'custom.sed'.  Unable to proceed!\n" >&2
    exit 1
fi

testout="Test_OUTPUT.cpp"

sed -f custom.sed "${testfile}" >"${testout}"
test ${DBG} -eq 1 && echo "fence 7"

if [ -s "${testout}" ]
then
    more "${testout}"
else
    echo -e "\t Failed to create '${testout}'.\n" >&2
    exit 1
fi

Unfortunately, what they presented doesn't seem to work. I wish there was something like bash's "set -x" for command expansion/reporting of sed execution to stderr, but I haven't found anything like that.

The execution log for the above is as follows:

fence 1
fence 2
fence 3
fence 4
fence 5
sed: file blockrep.sed line 19: unterminated `s' command
fence 6
     Failed to create 'custom.sed'.  Unable to proceed!

Maybe an expert out there can resolve the logic error in the imported blockrep.sed script ... because I can't get my head wrapped around it to fix it, even with all the comments provided.

I openly attest to the fact that I am very simplistic/limited in both my knowledge, and my usage, of sed. I couldn't begin to understand how that "blockrep.sed" script is trying to do what it claims, only that it states all content of findrep.txt, before the defined separator string "###REPLACE_BY###", is to be replaced by all below that same separator.

In my view, the approach identified by the linuxtopia guide would have broad application and be beneficial for many, including the OP and myself.

CodePudding user response:

I resorted to discretizing portions of the blockrep.sed script to see if I could identify a source of breakdown. While that made no logical difference on the surface, that did create a functional and well-formed structure ... which did create a usable custom.sed, but only after I removed the --debug option for the execution of blockrep.sed. That is required because the degug info is not sent to the stderr, but is inline with the stdout !!! I don't know enought to classify that as a bug.

The working version of the script is as follows:

#!/bin/bash

DBG=1

file1=""
file2=""
#divider=""
doReview=0
while [ $# -gt 0 ]
do
    case $1 in
        --old_pattern ) file1="$2" ; shift ; shift ;;
        --new_pattern ) file2="$2" ; shift ; shift ;; 
        #--pattern_sep ) divider="$2" ; shift ; shift ;;    ### Not yet implemented
        --review ) doReview=1 ; shift ;;
        * ) echo "\n invalid option used on command line.  Only valid options: [ --old_pattern {textfile1} | --new_pattern {textfile2} ] \n Bye!\n" ; exit 1 ;;
    esac
done

###
### Code segment to be replaced
###
if [ -z "${file1}" ]
then
file1="File1.cpp"
rm -f "${file1}"
cat >"${file1}" <<"EnDoFiNpUt"
void Component::initialize()
{
    my_component = new ComponentClass();
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 1"
fi

###
### Code segment to be used as replacement
###
if [ -z "${file2}" ]
then
file2="File2.cpp"
rm -f "${file2}"
cat >"${file2}" <<"EnDoFiNpUt"
void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 2"
fi

###
### Create demo input file
###
testfile="Test_INPUT.cpp"
rm -f "${testfile}"
{
    echo "
other code1()
{
    doing other things
    doing more things
    doing extra things
} 
"
    cat "${file1}"

echo "
other code2()
{
    creating other things
    creating more things
    creating extra things
} 
"
} >>"${testfile}"

test ${DBG} -eq 1 && echo "fence 3"

###
### Create editing specification file
###
{
    cat "${file1}"
    echo "###REPLACE_BY###"
    cat "${file2}"
} >findrep.txt

test ${DBG} -eq 1 && echo "fence 4"


###
### sed script to create editing instructions to apply above editing specification file
if [ ! -s "blockrep.sed" ]
then
cat >"blockrep.sed" <<"EnDoFiNpUt"
#SOURCE:    https://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq4_013.html
#
# filename: blockrep.sed
#   author: Paolo Bonzini
# Requires:
#    (1) blocks to find and replace, e.g., findrep.txt
#    (2) an input file to be changed, input.file
#
# blockrep.sed creates a second sed script, custom.sed,
# to find the lines above the row of 4 hyphens, globally
# replacing them with the lower block of text. GNU sed
# is recommended but not required for this script.
#
# Loop on the first part, accumulating the `from' text
# into the hold space.
##############################################################################
### Original coding from linuxtopia
##############################################################################
### :markerA
### /^###REPLACE_BY###$/! {
###     # Escape slashes, backslashes, the final newline and
###     # regular expression metacharacters.
###     s,[/\[.*],\\&,g
###     ### add backslash to end of line (to avoid having sed think of as end of command input
###     s,$,\\,
###     H
###     #
###     # Append N cmds needed to maintain the sliding window.
###     x
###     1 s,^.,s/,
###     1! s,^,N\
### ,
###     x
###     n
###     b markerA
### }
##############################################################################
### Discretized version of coding
##############################################################################
:markerA
/^###REPLACE_BY###$/! {
    #
    # Escape slashes
    s,[/],\\&,g
    #
    # Escape backslashes
    s,[\],\\&,g
    #
    # Escape regular expression metacharacters
    s,[[],\\&,g
    s,[.],\\&,g
    s,[*],\\&,g
    #
    # Escape the final newline
    #   add backslash to end of line (to avoid having sed 
    #   think of as end of command input
    s,$,\\,
    H
    #
    # Append N cmds needed to maintain the sliding window.
    x
    1 s,^.,s/,
    1! s,^,N\
,
    x
    n
    b markerA
}
##############################################################################
#
# Change the final backslash to a slash to separate the
# two sides of the s command.
x
s,\\$,/,
x
#
# Until EOF, gather the substitution into hold space.
:markerB
n
s,[/],\\&,g
s,[\],\\&,g
$! s,$,\\,
H
$! b markerB
#
# Start the RHS of the s command without a leading
# newline, add the P/D pair for the sliding window, and
# print the script.
g
s,/\n,/,
s,$,/\
P\
D,p
#---end of script---
EnDoFiNpUt
fi

test ${DBG} -eq 1 && echo "fence 5"


rm -fv custom.sed custom.err
#sed --debug -f blockrep.sed findrep.txt >custom.sed 2>custom.err
sed -nf blockrep.sed findrep.txt >custom.sed 2>custom.err
if [ -s custom.err ]
then
    if [ ${doReview} -eq 1 ]
    then
        cat custom.err
    fi
fi
test ${DBG} -eq 1 && echo "fence 6"

if [ -s custom.sed ]
then
    if [ ${doReview} -eq 1 ]
    then
        more custom.sed
        echo -e "\t Hit return to continue ..." >&2
        read k <&2
    fi
else
    echo -e "\t Failed to create 'custom.sed'.  Unable to proceed!\n" >&2
    exit 1
fi

testout="Test_OUTPUT.cpp"

sed -f custom.sed "${testfile}" >"${testout}"
test ${DBG} -eq 1 && echo "fence 7"

if [ -s "${testout}" ]
then
    if [ ${doReview} -eq 1 ]
    then
        more "${testout}"
    fi
else
    echo -e "\t Failed to create '${testout}'.\n" >&2
    exit 1
fi

exit

The resulting session output is

fence 1
fence 2
fence 3
fence 4
fence 5
removed 'custom.sed'
removed 'custom.err'
fence 6
N
N
N
s/void Component::initialize()\
{\
    my_component = new ComponentClass();\
}/void Component::initialize()\
{\
    if (doInit)\
    {\
        my_component = new ComponentClass();\
    }\
    else\
    {\
        my_component.ptr = null;\
    }\
}/
P
D
     Hit return to continue ...

fence 7

other code1()
{
    doing other things
    doing more things
    doing extra things
} 

void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}

other code2()
{
    creating other things
    creating more things
    creating extra things
} 

Which is as was initially intended. Success!

  • Related