I have the string:
lopy,lopy1,sym,lopy,lopy1,sym"
I want the line to be:
lopy,lopy1,sym,lady,lady1,sym
Which means that all "lad" after the string sym should be replaced. So I ran:
echo "lopy,lopy1,sym,lopy,lopy1,sym" | sed -r 's/(.*sym.*?)lopy/\1lad/g'
I get:
lopy,lopy1,sym,lopy,lad1,sym
Using Perl is not really better:
echo "lopy,lopy1,sym,lopy,lopy1,sym" | perl -pe 's/(.*sym. ?)lopy/${1}lad/g'
yields
lopy,lopy1,sym,lad,lopy1,sym
Not all "lopy" are replaced. What am I doing wrong?
CodePudding user response:
The (.*sym.*?)lopy
/ (.*sym. ?)lopy
patterns are almost the same, . ?
matches one or more chars other than line break chars, but as few as possible, and .*?
matches zero or more such chars. Mind that sed
does not support lazy quantifiers, *?
is the same as *
in sed
. However, the main problem with the regexps you used is that they match sym
, then any text after it and then lopy
, so when you added g
, it just means you want to find more cases of lopy
after sym....lopy
. And there is only one such occurrence in your string.
You want to replace all lopy
after sym
, so you can use
perl -pe 's/(?:\G(?!^)|sym).*?\Klopy/lad/g'
See the regex demo. Details:
(?:\G(?!^)|sym)
-sym
or end of the previous match (\G(?!^)
).*?
- any zero or more chars other than line break chars, as few as possible\K
- match reset operator that discards all text matched so farlopy
- alopy
string.
See the online demo:
#!/bin/bash
echo "lopy,lopy1,sym,lopy,lopy1,sym" | perl -pe 's/(?:\G(?!^)|sym).*?\Klopy/lad/g'
# => lopy,lopy1,sym,lad,lad1,sym
If the values are always comma separated, you may replace .*?
with ,
: (?:\G(?!^)|sym),\Klopy
(see this regex demo).
CodePudding user response:
sed
does not support non-greedy wildcards at all. But your Perl script also fails for other reasons; you are saying "match all occurrences of this" but then you specify a regex which can only match once.
A common simple solution is to split the string, and then replace only after the match:
echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe 'if (@x = /^(.*?sym,)(.*)/) { $x[1] =~ s/lop/lad/g; s/.*/$x[0]$x[1]/ }'
If you want to be fancy, you can use a lookbehind to only replace the lop
occurrences after the first sym
.
echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe 's/(?<=sym.{0,200})lop/lad/'
The variable-length lookbehind generates a warning and is only supported in Perl 5.30 (you can turn it off with no warnings qw(experimental::vlb));
.)
CodePudding user response:
The problem is that the lopy
s to replace must be after sym
so a global replacement looks for yet more of the whole lopy
s-after-sym
, not just for all lopy
s.
To replace all lopy
s (after the first sym
, followed by another sym
) we can capture the substring between sym
s and in the replacement side run code, in which a regex replaces all lopy
s
echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe's{ sym,\K (. ?) (?=sym) }{ $1 =~ s/lop/lad/gr }ex'
To isolate only the substring with lopy
s after sym
I use \K
after the sym
, which drops matches prior to it, and a positive lookahead for the sym
after it, which doesn't consume anything. In the replacement side's regex we must use /r
since $1
isn't allowed to change, and we want the regex to return.