Home > Blockchain >  Regex pattern different behavior during the last repetition
Regex pattern different behavior during the last repetition

Time:11-13

I have a string word1.word2.word3,word1.word2.word3.word4.word5,word1.word2 I would create regular expression to match it. I have a constraint I must have 0 to 5 words separated by a point then I can add a comma and I can repeat all this up to three times at most

I made a try and I succeeded to match my string except the last character and this is my regular expression

^(((?:\w ){1}(?:\.(?:\w )){0,4})(?:\,{1})){0,3}$

it's matching this string : word1.word2.word3,word1.word2.word3.word4.word5,word1.word2,

My question is how I can remove the last comma from the expression?

CodePudding user response:

You can use

^(?:\w (?:\.\w ){0,4}(?:,(?!$)|$)){0,3}$

See the regex demo. Details:

  • ^ - start of string
  • (?:\w (?:\.\w ){0,4}(?:,(?!$)|$)){0,3} - zero, one, two or three occurrences of:
    • \w - one or more word chars
    • (?:\.\w ){0,4} - zero to four occurrences of . and one or more word chars
    • (?:,(?!$)|$) - either a comma not at the end of string, or end of string
  • $ - end of string.

CodePudding user response:

You could attempt to match the regular expression

^(?:\w (?:\.\w ){0,4}(?:,\w (?:\.\w ){0,4}){0,3})?$

PCRE Demo

To understand the operations being performed, hover the cursor over each part of the expression at the link to obtain an explanation of its function.

My understanding of the question is that empty strings are to be matched. That is the reason (and only reason) for the optional outer non-capture group. (I could alternatively have used an alternation: ^$|^\w ...{0,3})?$.) If empty strings are not to be matched that non-capture group can be removed:

^\w (?:\.\w ){0,4}(?:,\w (?:\.\w ){0,4}){0,3}$

Notice the repetition signified by the party hats:

^(?:\w (?:\.\w ){0,4}(?:,\w (?:\.\w ){0,4}){0,3})?$
    ^^^^^^^^^^^^^^^^^    ^^^^^^^^^^^^^^^^^

This makes this expression a good candidate for using a subroutine (or subexpression) if it is supported by the regex engine being used. For PCRE this would be

^(?:(\w (?:\.\w ){0,4})(?:,(?1)){0,3})?$

if a numbered capture is used. If a named capture group is preferred, one might write

^(?:(?P<words_sep_by_periods>\w (?:\.\w ){0,4})(?:,(?P>words_sep_by_periods)){0,3})?$

(?1) ((?P>words_sep_by_periods)) causes the engine to execute, at that location, the earlier regex code that saves a match to capture group 1 (words_sep_by_periods).

PCRE demo with subroutine

The use of subroutines generally makes regex code easier to understand (my opinion, anyway) and reduces the chance of errors being introduced when constructing the expression.

  • Related