Home > Mobile >  Finding the exact match with regular expression while other similar variants present
Finding the exact match with regular expression while other similar variants present

Time:12-28

I'm currently using sed to change strings in a file.

Inside the text I have for example:

service:123,service:234,service:345
test-service

I would like to change only the text with port number into certain address such as "127.0.0.1" so the first line should look like "127.0.0.1:123" etc.

However, with the command I use sed -e 's/service/127.0.0.1/g' my_file the string "test-service" will also be changed which I don't intend to do. I've looked up many threads about changing the exact match but none of them suit this purpose.

CodePudding user response:

The usual fix is to update the regular expression to not just match an exact string, which will also find substring matches within longer strings.

sed out of the box sometimes supports a word boundary regex (depending on the regex dialect, \< / \> or \b) but here, you also want to avoid matching on a hyphen, which usually does qualify as a word boundary, so you basically have to roll your own.

In so many words, a word boundary is either (a) a non-word character (for example, let's say you define this to mean [^A-Za-z0-9_-] where we conspicuously include hyphen) or (b) an empty string, i.e. the match is adjacent to the beginning or end of string.

Some sed variants will allow you to say

s/\(^\|[^A-Za-z0-9_-]\)service\([^A-Za-z0-9_-]\|$\)/\1127.0.0.1\2/

but others will require a different formulation. Perhaps the simplest and most portable solution in this case is to capture and restore everything before the word boundary:

s/^\(.*[^A-Za-z0-9_-]\)\?service:/\1127.0.0.1:/

In case it's not obvious, the capturing parentheses \(...\) collect the string which was matched into a back-reference \1 which we can use in the replacement string to put back the same text we matched. (When there are multiple capturing parentheses, they are numbered from the left, so the one corresponding to the first left parenthesis is \1, the second is \2, etc.)

With modern sed variants you can add an -r or -E option to switch to a less cumbersome regex variant where you don't have to backslash the "extended" regex metacharacters ?, (, |, ), - the historic accident which caused a backslash to mean the precise opposite of "quote this character" in front of some regex metacharacters was quite unfortunate, but we are stuck with it for maximally portable sed scripts.

  • Related