I want to find a regex (preferably in perl, but any flavour will do) to replace every _
except those preceded by exactly 8 digits and followed by exactly 6 digits.
Actually, I want to replace _
in filenames except those in dates with format YYYYMMDD_hhmmss
.
Generally speaking, I want to replace every occurrances of some character that is not preceded by some pattern and not followed by an other pattern.
I tried many regexes and look for at lot on the web, but I did not find anything!
I know it is possible to replace every _
by .
, then restore the _
in YYYYMMDD.hhmmss
, but I am interested in doing it in one step (hoping it is possible).
Here are some examples of replacements:
Patate_17890505_TitreEnCamelCase.ext --> Patate.17890505.TitreEnCamelCase.ext
EPFL_AlgebreLineaire --> EPFL.AlgebreLineaire
ipe.20210302_005606.pdf --> ipe.20210302_005606.pdf
1_ --> 1.
12_ --> 12.
_1 --> .1
_12 --> .12
12345678_ --> 12345678.
_123456 --> .123456
12345678_12345 --> 12345678.12345
1234567_123456 --> 1234567.123456
1234567_12345 --> 1234567.12345
123456_12345 --> 123456.12345
12345678_1234567 --> 12345678.1234567
123456789_123456 --> 123456789.123456
123456789_1234567 --> 123456789.1234567
_patate__truc__ --> .patate..truc..
___ --> ...
foo_12345678 --> foo.12345678
foo_12345678_123456_bar --> foo.12345678_123456.bar
12345678_123456 --> 12345678_123456
foo12345678_123456bar --> foo12345678_123456bar
Below, a few examples I tried.
Make exactly the opposite of what I want, i.e. replace every _
preceded by exactly 8 digits and followed by exactly 6 digits (try it on regex101):
s/((?<!\d)(?:\d{8}))_((?:\d{6})(?!\d))/$1.$2/g
It works, so I need the negation of this regex…
Just a negative lookbehind and a negative lookahead (try it on regex101):
s/(?<!\d{8})_(?!\d{6})/./g
Fails: does not replace if _
is preceded by exactly 8 digits or followed by exactly 6 digits, e.g. the _
is not replaced in theses strings:
12345678_
_123456
12345678_12345
1234567_123456
I need to replace all except when “and”, but this one replaces all except when “or” (so it misses some _
).
Inspired from this answer (from python regex: match a char surrounded by exactly 2 chars) (try it on regex101):
s/(?<!(?<!\d)\d{8})_(?!\d{6}(?!\d))/./g
Fails: same reason as the previous one.
The regex in the original answer works because it replace chars preceded by a pre-pattern and followed by a post-pattern.
Inspired from this answer (from Replace character UNLESS surrounded by specific tag), but I do not really understand how it works (try it on regex101):
s/_(?:(?!(?:.*?\d{6}))|(?=[^\d] \d{8}))/./g
Fails: in these examples, the _
is not replaced
_123456
1234567_123456
12345678_1234567
123456789_123456
123456789_1234567
foo_12345678
The original problem is quite close of mine, but instead of \d{8}
and \d{6}
, the pre-pattern and post-pattern are HTML tags, so the problem is easier : <tag>
and </tag>
are unique elements where for my problem, the post-pattern \d{6}
could be followed by an other digit (likewise the pre-pattern \d{8}
could be preceded by an other digit).
But this one almost work, unlike the previous try, it replace the _
in both theses string:
12345678_
12345678_12345
so perhaps a modification could make it works as I want…
CodePudding user response:
You can use
(?<!\d)\d{8}_\d{6}(?!\d)(*SKIP)(*F)|_
See the regex demo. Details:
(?<!\d)\d{8}_\d{6}(?!\d)
- eight digits,_
and six digits not enclosed with any other digits(*SKIP)(*F)
- fail the match at the current location and continue the regex search from the failure location|
- or_
- an underscore in any other context.
An alternative regex is
_(?!(?<=(?<!\d)\d{8}_)\d{6}(?!\d))
See this regex demo. Details:
_
- an underscore(?!(?<=(?<!\d)\d{8}_)\d{6}(?!\d))
- a negative lookahead that fails the match if - immediately to the right of the current location - there are six (and no more than six) digits immediately preceded with exactly eight digits and an underscore.