can sed replace words in pattern substring match in one line?-CodePudding

original line in file sed.txt:

outer_string_PATTERN_string(PATTERN_And_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string

only need to replace PATTERN to pattern which in brackets, not lowercase, it could replace to other word.

expect result:

outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

I could use ([^)]*) pattern to find the substring which would be replace some worlds in. But I can't use this pattern to index the substring's position, and it will replace the whole line's PATTERN to pattern.

:/tmp$ sed 's/([^)]*)/---/g' sed.txt 
outer_string_PATTERN_string---PATTERN_outer_string---_outer_string

:/tmp$ sed '/([^)]*)/s/PATTERN/pattern/g' sed.txt 
outer_string_pattern_string(pattern_And_pattern_pattern_i)pattern_outer_string(i_pattern_inner)_outer_string

I also tried to use the regex group in sed to capture and replace the words, but I can't figure out the command.

Can sed implement that? And how to achieve that? THX.

CodePudding user response：

Can sed implement that?

Yes. But you do not want to do it in sed. Use other programming language, like Python, Perl, or awk.

how to achieve that?

Implementing non-greedy regex is not simple in sed. Basically, generally, it consists of:

taking chunk of the input
process the chunk
put it in hold space
shuffle hold with pattern space - extract what been already processed, what's not
repeat
shuffle with hold space
output

Anyway, the following script:

#!/bin/bash
sed <<<'outer_string_PATTERN_string(PATTERN_i_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string' '
    :loop;
    /\([^(]*\)\(([^)]*)\)\(.*\)/{
        # Lowercase the second part.
        s//\1\L\2\E\n\3/;
        # Mix with hold space.
        G;
        s/\(.*\)\n\(.*\)\n\(.*\)/\3\1\n\2/;
        # Put processed stuff into hold spcae
        h; s/\n.*//; x;
        # Process the other stuff again.
        s/.*\n//;
        bloop;
    };
    # Is hold space empty?
    x; /^$/!{
        # Pattern space has trailing stuff - add it.
        G; s/\n//;
        # We will print it.
        h;
        # Clear hold space
        s/.*//
    };x;
'

outputs:

PATTERN_outer_string(i_pattern_inner)outer_string_PATTERN_string(pattern_i_pattern_pattern_i)_outer_string

CodePudding user response：

As an alternative, it is easier to do this in gnu awk with RS that matches (...) substring:

awk -v RS='\\([^)] )' '{gsub(/PATTERN/, "pattern", RT); ORS=RT} 1' file

outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

Steps:

RS='\\([^)] )' captures a (...) string as record separator
gsub function then replaces PATTERN with pattern in matched text i.e. RT
ORS=RT sets ORS as the new modified RT
1 prints each record to stdout

Another alternative solution using lookahead assertion in a perl regex:

perl -pe 's/PATTERN(?=[^()]*\))/pattern/g' file

CodePudding user response：

Solved by this:

:/tmp$ sed 's/(/\n(/g' sed.txt | sed 's/)/)\n/g' | sed '/([^)]*)/s/PATTERN/pattern/g' | sed ':a;N;$!ba;s/\n//g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

make pattern () in a new line
find the () lines and replace the PATTERN to pattern
merge multiple lines in one line

thanks for How can I replace a newline (\n) using sed?

CodePudding user response：

Can sed implement that?

It can be done using GNU sed and basic regular expressions (BRE):

sed '
s/)/)\n/g
:1
s/\(([^)]*\)PATTERN\([^)]*)\n\)/\1pattern\2/
t1
s/\n//g
' < file

where

1st s inserts a newline after each )
2nd s replaces the last (* is greedy) PATTERN inside ()s with pattern
t loops back if a substitution was made
3rd s strips all inserted newlines

EDIT

2nd substitute command edited according to OP's suggestion since there is no need to match \n inside ().

CodePudding user response：

You can try this sed

sed -E 's/\(.?PATTERN.?[^)]*\)/\L&/g'

Here, we are looking to match the word PATTERN only if it resides within brackets.

Output

outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

New Example Output

echo "outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string" | sed -E 's/\(.?PATTERN.?[^)]*\)/\L&/g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string