Home > other >  Regexp - match till dot, but without last character
Regexp - match till dot, but without last character

Time:01-04

I have big files with multiple entries like

car.bus.bike:
car.bus.bike.vehicle
car.bus.bike
_car.bus.bike
'car.bus.bike'

I want to match and replace car.bus.bike without matching last : or do not match where there is .vehicle or where matched is prefixed with anything.

So in the end I want to replace car.bus.bike with cat.mouse.dog to be like:

cat.mouse.dog:
car.bus.bike.vehicle
cat.mouse.dog
_car.bus.bike
'cat.mouse.dog'

I had tried with matching till ., with [^-_.]$ but it also matches : I tried with positive lookahead (?=\:) or negative lookbehind (?<!_) but every time it only covers one case.

CodePudding user response:

You can search using this regex:

(?<!\.)\b\pL\w*(?:\.\w ){2}(?=[':]|$)

And replace it with:

cat.mouse.dog

RegEx Demo

RegEx Details:

  • (?<!\.): Assert that we don't have dot at the previous position
  • \b: Match a word boundary
  • \pL: Match a unicode letter
  • \w*: Match 0 or more word characters
  • (?:\.\w ){2}: Match a dot followed by 1 word characters. Repeat this group 2 times
  • (?=[':]|$): Assert that we have a ; or ' or end of line at the next position

For PHP use this regex:

/(?<!\.)\b\pL\w*(?:\.\w ){2}(?=[':]|$)/mu

CodePudding user response:

I assume that the string must begin with a letter or single quote. If we are confident that if it begins with a single quote it will end with one as well we can substitute 'cat.mouse.dog' for a match of the following regular expression.

^'?\p{L} (?:\.\p{L} ){2}(?!\.vehicle$)[.:']?$

Demo

This expression can be broken down as follows (and/or hover the cursor over each part of the expression at the link to obtain an explanation of its function).

^             # match beginning of string
'?            # optionally match a single quote
\p{L}         # match one or more unicode letters
(?:           # begin non-capture group
  \.\p{L}     # match a period followed by one or more unicode letters
){2}          # end non-capture group and execute it twice
(?!           # begin negative lookahead
  \.vehicle$  # match '.vehicle' at the end of the string
)             # end negative lookahead
[.:']?        # optionally (?) match one of the three chars in the char class
$             # match end of string

If we wish to ensure that the string begins with a single quote if and only if it ends with one as well we need to modify the regular expression. One way of doing that is to insert the following positive lookahead after the beginning-of-string anchor (^).

(?='[^'] '$|[^'] $)

Demo

(At the link I set the multiline flag and changed the two instances of [^'] to [^'\n] in order to demonstrate which of several strings were matched by the expression.)

To restrict matches of letters to English letters replace \p{L} with [a-z] and set the case-indifferent flag (e.g., add (?i) at the beginning).

  •  Tags:  
  • Related