I have a regex below to find lastName, firstName, middleName with dots, without dots, with spaces, without spaces etc. How to improve my regex, to match all my examples without issues?
[А-Я] [а-я]*\s [А-Я]\.*[а-я]*\.*\s*[А-Я]*[а-я]*\.*\,*
Here you can see the first name in green, middle [if any] in blue, and surname in orange and it does this solely based on these assumptions:
- the first name is a capital letter, followed by lowercase letters, and separated from further names by a single space
- there are one or two names following this first name
- these later names may either take the form of the first name, or be a single capital letter followed by a space, a period, or another name
- the end of the name is only recognisable at the end of a line, some other word (something beginning with a lowercase letter), or a non-word non-whitespace character
But outside of a toy for learning, or perhaps a highlighting aid for human reading, it would never be perfect, for that you would need actual language parsers; something that understands not names, but all the other words and the syntax between them.