Home > Software engineering >  Regular expression for SSN without all consecutive numbers
Regular expression for SSN without all consecutive numbers

Time:06-11

I'm working on a regular expression for SSN with the rules below. I have successfully applied all matching rules except #7. Can someone help alter this expression to include the last rule, #7:

^((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$|(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}$)
  1. Hyphens should be optional (this is handled above by using 2 expressions with an OR
  2. Cannot begin with 000
  3. Cannot begin with 666
  4. Cannot begin with 900-999
  5. Middle digits cannot be 00
  6. Last four digits cannot 0000
  7. Cannot be all the same numbers ex: 111-11-1111 or 111111111

CodePudding user response:

Add the following negative look ahead anchored to start:

^(?!(.)(\1|-) $)

See live demo.

This captures the first character then asserts the rest of the input is not made of that captured char or hyphen.

The whole regex can be shortened to:

^(?!(.)(\1|-) $)(?!000|666|9..)(?!...-?00)(?!.*0000$)\d{3}(-?)\d\d\3\d{4}$

See live demo.

The main trick to not having to repeat the regex both with and without the hyphens was to capture the optional hyphen (as group 3), then use a back reference \3 to the capture in the next position, so are either both there or both absent.

CodePudding user response:

First, let's shorten the pattern as it contains two next-to identical alternatives, one matching SSN with hyphens, and the other matching the SSN numbers without hyphens. Instead of ^(x-y-z$|xyz$) pattern, you can use a ^x(-?)y\1z$ pattern, so your regex can get reduced to ^(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\1(?!0000)[0-9]{4}$, see this regex demo here.

To make a pattern never match a string that contains only identical digits, you may add the following negative lookahead right after ^:

(?!\D*(\d)(?:\D*\1)*\D*$)

It fails the match if there are

  • \D* - zero or more non-digits
  • (\d) - a digit (captured in Group 1)
  • (?:\D*\1)* - zero or more occurrences of any zero or more non-digits and then then same digit as in Group 1, and then
  • \D*$ - zero or more non-digits till the end of string.

Now, since I suggested shortening the regex to the pattern with backreference(s), you will have to adjust the backreferences after adding this lookahead.

So, your solution looks like

^(?!\D*(\d)(?:\D*\1)*\D*$)(?!000|666)[0-8]\d{2}(-?)(?!00)\d{2}\2(?!0000)\d{4}$
^(?![^0-9]*([0-9])(?:[^0-9]*\1)*[^0-9]*$)(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\2(?!0000)[0-9]{4}$

Note the \1 in the pattern without the lookahead turned into \2 as (-?) became Group 2.

See the regex demo.

Note also that in some regex flavors \d is not equal to [0-9].

  • Related