Home > front end >  Matching consecutive digits in regex while ignoring dashes in python3 re
Matching consecutive digits in regex while ignoring dashes in python3 re

Time:10-25

I'm working to advance my regex skills in python, and I've come across an interesting problem. Let's say that I'm trying to match valid credit card numbers , and on of the requirments is that it cannon have 4 or more consecutive digits. 1234-5678-9101-1213 is fine, but 1233-3345-6789-1011 is not. I currently have a regex that works for when I don't have dashes, but I want it to work in both cases, or at least in a way i can use the | to have it match on either one. Here is what I have for consecutive digits so far:

validNoConsecutive = re.compile(r'(?!([0-9])\1{4,})')

I know I could do some sort of replace '-' with '', but in an effort to make my code more versatile, it would be easier as just a regex. Here is the function for more context:

def isValid(number):
    validStart = re.compile(r'^[456]') # Starts with 4, 5, or 6
    validLength = re.compile(r'^[0-9]{16}$|^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$') # is 16 digits long
    validOnlyDigits = re.compile(r'^[0-9-]*$') # only digits or dashes
    validNoConsecutive = re.compile(r'(?!([0-9])\1{4,})') # no consecutives over 3
    validators = [validStart, validLength, validOnlyDigits, validNoConsecutive]
    return all([val.search(number) for val in validators])

    
list(map(print, ['Valid' if isValid(num) else 'Invalid' for num in arr]))

I looked into excluding chars and lookahead/lookbehind methods, but I can't seem to figure it out. Is there some way to perhaps ignore a character for a given regex? Thanks for the help!

CodePudding user response:

You can add the (?!.*(\d)(?:-*\1){3}) negative lookahead after ^ (start of string) to add the restriction.

The ^(?!.*(\d)(?:-*\1){3}) pattern matches

  • ^ - start of string
  • (?!.*(\d)(?:-*\1){3}) - a negative lookahead that fails the match if, immediately to the right of the current location, there is
    • .* - any zero or more chars other than line break chars as many as possible
    • (\d) - Group 1: one digit
    • (?:-*\1){3} - three occurrences of zero or more - chars followed with the same digit as captured in Group 1 (as \1 is an inline backreference to Group 1 value).

See the regex demo.

If you want to combine this pattern with others, just put the lookahead right after ^ (and in case you have other patterns before with capturing groups, you will need to adjust the \1 backreference). E.g. combining it with your second regex, validLength = re.compile(r'^[0-9]{16}$|^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$'), it will look like

validLength = re.compile(r'^(?!.*(\d)(?:-*\1){3})(?:[0-9]{16}|[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4})$')
  • Related