Home > Software engineering >  Regular expression for avoid repeated consecutive chars, but excluding one specific char
Regular expression for avoid repeated consecutive chars, but excluding one specific char

Time:07-04

I am learning a lot of Regex with Python, and I am struggled with a regex for allow test cases where doesn't have consecutive 4 or more repeated chars, I have found easy solutions in Internet but I need evaluate the repeated chars excluding the "-" char, for example:

bic-hota  // Valid
bichota  // Valid
bichota1234 // Valid
bich222ota // Valid, sequence "222" with length of 3 is allowed 

bichota2222 // Invalid because have "2222" and isn't allowed length of 4 or more
bichota22-22 // Invalid because have "2222" (ignore the '-') and isn't allowed length of 4 or more

In the last test case, you can see the sequence "22-22", I need ignore the "-" between the sequence for validate whole the sequence, and I have searched about how to solve this and I have use the Non-capture group (?:-?) but this doesn't works :(

^(?![\w]*([\w])(?:-?)\1{3,})[\w-] $

What is wrong with my regex and how to fix it?

Test cases: https://regexr.com/6p0ca

CodePudding user response:

You may use this regex:

^(?!.*(\w)(?:-?\1){3}). 

RegEx Demo

RegEx Details:

  • ^: Start
  • (?!: Start negative lookahead
    • .*: Match any length of characters
    • (\w): Match a character and capture in group #1
    • (?:-?\1){3}: Match optional - followed by back-reference to group #1. Repeat this group 3 times.
  • ): End negative lookahead
  • . : Match 1 of any characters

To allow only word characters and hyphens use:

^(?![\w-]*(\w)(?:-?\1){3})[\w-] $
  • Related