My regex works on validating first and last names. The acceptable forms are as follows:
- Jacob Wellman
- Wellman, Jacob
- Wellman, Jacob Wayne
- O’Shaughnessy, Jake L.
- John O’Shaughnessy-Smith
- Kim
The unacceptable forms are as follows:
- Timmy O’’Shaughnessy
- John O’Shaughnessy--Smith
- K3vin Malone
- alert(“Hello”)
- select * from users;
My current regex is as follows.
^[\w'\-,.][^0-9_!¡?÷?¿\\ =@#$%ˆ&*(){}|~<>;:[\]]{2,}$
It works properly for validating all of the names except for:
- Timmy O’’Shaughnessy
- John O’Shaughnessy--Smith
The reason for this is that the regex doesn't take into account consecutive identical special characters. How can I change my regex to take those into account?
CodePudding user response:
You can exclude consecutive characters by using a negative lookahead with a backreference to assert not a character directly followed by the same character ^(?!.*([’-])\1
Note that your current pattern matches names that are at least 3 letter long, and will not match for example names like Al
If you want to match that as well, you can change {2,}
to
in the pattern.
^(?!.*([’-])\1)[\w',.-][^\n\r0-9_!¡?÷¿\\ =@#$%ˆ&*(){}|~<>;:[\]]{2,}$
Matching names can be difficult, this page has an interesting read about names:
Falsehoods Programmers Believe About Names
CodePudding user response:
^(:?[^0-9'\-\., _!¡?÷?¿\\ =@#$%ˆ&*(){}|~<>;:[\]] (:?['-]|, | |\.|\. |$)) $
I used your forbidden characters set and added '\-\.,
. Then I let them repeat
. I insert a group of allowed divisors: (:?['-]|, | |\.|\. |$)
and allow repeating this pattern
.
I tried it here.
CodePudding user response:
You could do it separately, before your validation. With a Perl regex, to remove additional special characters, it would be:
s/(\W)\1 /$1/g
so for example:
$ echo "John O’’Shaughnessy--Smith" | perl -C -pe 's/(\W)\1 /$1/g'
John O’Shaughnessy-Smith