Home > Back-end >  Replace all occurances of a character unless surrounded by two different patterns
Replace all occurances of a character unless surrounded by two different patterns

Time:04-13

I want to find a regex (preferably in perl, but any flavour will do) to replace every _ except those preceded by exactly 8 digits and followed by exactly 6 digits.
Actually, I want to replace _ in filenames except those in dates with format YYYYMMDD_hhmmss.
Generally speaking, I want to replace every occurrances of some character that is not preceded by some pattern and not followed by an other pattern.
I tried many regexes and look for at lot on the web, but I did not find anything!

I know it is possible to replace every _ by ., then restore the _ in YYYYMMDD.hhmmss, but I am interested in doing it in one step (hoping it is possible).

Here are some examples of replacements:

Patate_17890505_TitreEnCamelCase.ext  -->  Patate.17890505.TitreEnCamelCase.ext
EPFL_AlgebreLineaire                  -->  EPFL.AlgebreLineaire
ipe.20210302_005606.pdf               -->  ipe.20210302_005606.pdf
1_                                    -->  1.
12_                                   -->  12.
_1                                    -->  .1
_12                                   -->  .12
12345678_                             -->  12345678.
_123456                               -->  .123456
12345678_12345                        -->  12345678.12345
1234567_123456                        -->  1234567.123456
1234567_12345                         -->  1234567.12345
123456_12345                          -->  123456.12345
12345678_1234567                      -->  12345678.1234567
123456789_123456                      -->  123456789.123456
123456789_1234567                     -->  123456789.1234567
_patate__truc__                       -->  .patate..truc..
___                                   -->  ...
foo_12345678                          -->  foo.12345678
foo_12345678_123456_bar               -->  foo.12345678_123456.bar
12345678_123456                       -->  12345678_123456
foo12345678_123456bar                 -->  foo12345678_123456bar

Below, a few examples I tried.


Make exactly the opposite of what I want, i.e. replace every _ preceded by exactly 8 digits and followed by exactly 6 digits (try it on regex101):

s/((?<!\d)(?:\d{8}))_((?:\d{6})(?!\d))/$1.$2/g

It works, so I need the negation of this regex…


Just a negative lookbehind and a negative lookahead (try it on regex101):

s/(?<!\d{8})_(?!\d{6})/./g

Fails: does not replace if _ is preceded by exactly 8 digits or followed by exactly 6 digits, e.g. the _ is not replaced in theses strings:

12345678_
_123456
12345678_12345
1234567_123456

I need to replace all except when “and”, but this one replaces all except when “or” (so it misses some _).


Inspired from this answer (from python regex: match a char surrounded by exactly 2 chars) (try it on regex101):

s/(?<!(?<!\d)\d{8})_(?!\d{6}(?!\d))/./g

Fails: same reason as the previous one.
The regex in the original answer works because it replace chars preceded by a pre-pattern and followed by a post-pattern.


Inspired from this answer (from Replace character UNLESS surrounded by specific tag), but I do not really understand how it works (try it on regex101):

s/_(?:(?!(?:.*?\d{6}))|(?=[^\d] \d{8}))/./g

Fails: in these examples, the _ is not replaced

_123456
1234567_123456
12345678_1234567
123456789_123456
123456789_1234567
foo_12345678

The original problem is quite close of mine, but instead of \d{8} and \d{6}, the pre-pattern and post-pattern are HTML tags, so the problem is easier : <tag> and </tag> are unique elements where for my problem, the post-pattern \d{6} could be followed by an other digit (likewise the pre-pattern \d{8} could be preceded by an other digit).
But this one almost work, unlike the previous try, it replace the _ in both theses string:

12345678_
12345678_12345

so perhaps a modification could make it works as I want…

CodePudding user response:

You can use

(?<!\d)\d{8}_\d{6}(?!\d)(*SKIP)(*F)|_

See the regex demo. Details:

  • (?<!\d)\d{8}_\d{6}(?!\d) - eight digits, _ and six digits not enclosed with any other digits
  • (*SKIP)(*F) - fail the match at the current location and continue the regex search from the failure location
  • | - or
  • _ - an underscore in any other context.

An alternative regex is

_(?!(?<=(?<!\d)\d{8}_)\d{6}(?!\d))

See this regex demo. Details:

  • _ - an underscore
  • (?!(?<=(?<!\d)\d{8}_)\d{6}(?!\d)) - a negative lookahead that fails the match if - immediately to the right of the current location - there are six (and no more than six) digits immediately preceded with exactly eight digits and an underscore.
  • Related