Home > Blockchain >  sed and Perl regexp replaces once, with multiple replacements flag
sed and Perl regexp replaces once, with multiple replacements flag

Time:10-04

I have the string:

lopy,lopy1,sym,lopy,lopy1,sym"

I want the line to be:

lopy,lopy1,sym,lady,lady1,sym

Which means that all "lad" after the string sym should be replaced. So I ran:

echo "lopy,lopy1,sym,lopy,lopy1,sym" | sed -r 's/(.*sym.*?)lopy/\1lad/g'

I get:

lopy,lopy1,sym,lopy,lad1,sym

Using Perl is not really better:

echo "lopy,lopy1,sym,lopy,lopy1,sym" | perl -pe 's/(.*sym. ?)lopy/${1}lad/g'

yields

lopy,lopy1,sym,lad,lopy1,sym

Not all "lopy" are replaced. What am I doing wrong?

CodePudding user response:

The (.*sym.*?)lopy / (.*sym. ?)lopy patterns are almost the same, . ? matches one or more chars other than line break chars, but as few as possible, and .*? matches zero or more such chars. Mind that sed does not support lazy quantifiers, *? is the same as * in sed. However, the main problem with the regexps you used is that they match sym, then any text after it and then lopy, so when you added g, it just means you want to find more cases of lopy after sym....lopy. And there is only one such occurrence in your string.

You want to replace all lopy after sym, so you can use

perl -pe 's/(?:\G(?!^)|sym).*?\Klopy/lad/g'

See the regex demo. Details:

  • (?:\G(?!^)|sym) - sym or end of the previous match (\G(?!^))
  • .*? - any zero or more chars other than line break chars, as few as possible
  • \K - match reset operator that discards all text matched so far
  • lopy - a lopy string.

See the online demo:

#!/bin/bash
echo "lopy,lopy1,sym,lopy,lopy1,sym" | perl -pe 's/(?:\G(?!^)|sym).*?\Klopy/lad/g'
# => lopy,lopy1,sym,lad,lad1,sym

If the values are always comma separated, you may replace .*? with ,: (?:\G(?!^)|sym),\Klopy (see this regex demo).

CodePudding user response:

sed does not support non-greedy wildcards at all. But your Perl script also fails for other reasons; you are saying "match all occurrences of this" but then you specify a regex which can only match once.

A common simple solution is to split the string, and then replace only after the match:

echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe 'if (@x = /^(.*?sym,)(.*)/) { $x[1] =~ s/lop/lad/g; s/.*/$x[0]$x[1]/ }'

If you want to be fancy, you can use a lookbehind to only replace the lop occurrences after the first sym.

echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe 's/(?<=sym.{0,200})lop/lad/'

The variable-length lookbehind generates a warning and is only supported in Perl 5.30 (you can turn it off with no warnings qw(experimental::vlb));.)

CodePudding user response:

The problem is that the lopys to replace must be after sym so a global replacement looks for yet more of the whole lopys-after-sym, not just for all lopys.

To replace all lopys (after the first sym, followed by another sym) we can capture the substring between syms and in the replacement side run code, in which a regex replaces all lopys

echo "lopy,lopy1,sym,lopy,lopy1,sym" | 
    perl -pe's{ sym,\K (. ?) (?=sym) }{ $1 =~ s/lop/lad/gr }ex'

To isolate only the substring with lopys after sym I use \K after the sym, which drops matches prior to it, and a positive lookahead for the sym after it, which doesn't consume anything. In the replacement side's regex we must use /r since $1 isn't allowed to change, and we want the regex to return.

  • Related