Home > Blockchain >  can sed replace words in pattern substring match in one line?
can sed replace words in pattern substring match in one line?

Time:09-21

original line in file sed.txt:

outer_string_PATTERN_string(PATTERN_And_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string

only need to replace PATTERN to pattern which in brackets, not lowercase, it could replace to other word.

expect result:

outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

I could use ([^)]*) pattern to find the substring which would be replace some worlds in. But I can't use this pattern to index the substring's position, and it will replace the whole line's PATTERN to pattern.

:/tmp$ sed 's/([^)]*)/---/g' sed.txt 
outer_string_PATTERN_string---PATTERN_outer_string---_outer_string

:/tmp$ sed '/([^)]*)/s/PATTERN/pattern/g' sed.txt 
outer_string_pattern_string(pattern_And_pattern_pattern_i)pattern_outer_string(i_pattern_inner)_outer_string

I also tried to use the regex group in sed to capture and replace the words, but I can't figure out the command.

Can sed implement that? And how to achieve that? THX.

CodePudding user response:

Can sed implement that?

Yes. But you do not want to do it in sed. Use other programming language, like Python, Perl, or awk.

how to achieve that?

Implementing non-greedy regex is not simple in sed. Basically, generally, it consists of:

  • taking chunk of the input
  • process the chunk
  • put it in hold space
  • shuffle hold with pattern space - extract what been already processed, what's not
  • repeat
  • shuffle with hold space
  • output

Anyway, the following script:

#!/bin/bash
sed <<<'outer_string_PATTERN_string(PATTERN_i_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string' '
    :loop;
    /\([^(]*\)\(([^)]*)\)\(.*\)/{
        # Lowercase the second part.
        s//\1\L\2\E\n\3/;
        # Mix with hold space.
        G;
        s/\(.*\)\n\(.*\)\n\(.*\)/\3\1\n\2/;
        # Put processed stuff into hold spcae
        h; s/\n.*//; x;
        # Process the other stuff again.
        s/.*\n//;
        bloop;
    };
    # Is hold space empty?
    x; /^$/!{
        # Pattern space has trailing stuff - add it.
        G; s/\n//;
        # We will print it.
        h;
        # Clear hold space
        s/.*//
    };x;
'

outputs:

PATTERN_outer_string(i_pattern_inner)outer_string_PATTERN_string(pattern_i_pattern_pattern_i)_outer_string

CodePudding user response:

As an alternative, it is easier to do this in gnu awk with RS that matches (...) substring:

awk -v RS='\\([^)] )' '{gsub(/PATTERN/, "pattern", RT); ORS=RT} 1' file

outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

Steps:

  • RS='\\([^)] )' captures a (...) string as record separator
  • gsub function then replaces PATTERN with pattern in matched text i.e. RT
  • ORS=RT sets ORS as the new modified RT
  • 1 prints each record to stdout

Another alternative solution using lookahead assertion in a perl regex:

perl -pe 's/PATTERN(?=[^()]*\))/pattern/g' file

CodePudding user response:

Solved by this:

:/tmp$ sed 's/(/\n(/g' sed.txt | sed 's/)/)\n/g' | sed '/([^)]*)/s/PATTERN/pattern/g' | sed ':a;N;$!ba;s/\n//g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
  • make pattern () in a new line
  • find the () lines and replace the PATTERN to pattern
  • merge multiple lines in one line

thanks for How can I replace a newline (\n) using sed?

CodePudding user response:

Can sed implement that?

It can be done using GNU sed and basic regular expressions (BRE):

sed '
s/)/)\n/g
:1
s/\(([^)]*\)PATTERN\([^)]*)\n\)/\1pattern\2/
t1
s/\n//g
' < file

where

  • 1st s inserts a newline after each )
  • 2nd s replaces the last (* is greedy) PATTERN inside ()s with pattern
  • t loops back if a substitution was made
  • 3rd s strips all inserted newlines

EDIT

2nd substitute command edited according to OP's suggestion since there is no need to match \n inside ().

CodePudding user response:

You can try this sed

sed -E 's/\(.?PATTERN.?[^)]*\)/\L&/g'

Here, we are looking to match the word PATTERN only if it resides within brackets.

Output

outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

New Example Output

echo "outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string" | sed -E 's/\(.?PATTERN.?[^)]*\)/\L&/g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
  • Related