I am new to regex and I have been going round and round on this problem.
PHP: Check alphabetic characters from any latin-based language? gives the brilliant regex to check for any characters in the Latin script, which is part of what I need.
^\p{Latin} $
and provides a working example at https://regex101.com/r/I5b2mC/1
If I use the regex in PHP by using
echo preg_match('/^\p{Latin} $/', $testString);
and $testString
contains only Latin letters, the output will be 1
. If there is any non-Latin letters, the output will be 0
. Brilliant.
To add numbers in I tried ^\p{Latin} [[:alnum:]]*$
but that allows any characters in the Latin script OR non-Latin letters and numbers (letters without accents — grave, acute, cedilla, umlaut etc.) as it is the equivalent to [a-zA-Z0-9]
.
If you add any numbers with characters in the Latin script, echo preg_match('/^\p{Latin} [[:alnum:]]*$/', $testString);
returns a 0
. All numbers return a 0
too. This can be confirmed by editing the expression in https://regex101.com/r/I5b2mC/1
How do I edit the expression in echo preg_match('/^\p{Latin} $/', $testString);
to output a 1
if there are any characters in the Latin script, any numbers and/or spaces in $testString
? For example, I wish for a 1
to be output if $testString
is Café ßüs 459
.
CodePudding user response:
There are at least two things to change:
- Add
u
flag to support chars other than ASCII (/^\p{Latin} $/
=>/^[\p{Latin}] $/u
) - Create a character class for letters, digits and whitespace patterns (
/^\p{Latin} $/u
=>^[\p{Latin}] $/u
) - Then add the digit and whitespace patterns. If you need to support any Unicode digits, add
\d
. If you need to support only ASCII digits, add0-9
.
Thus, you can use
preg_match('/^[\p{Latin}\s0-9] $/u', $testString) // ASCII only digits
preg_match('/^[\p{Latin}\s\d] $/u', $testString) // Any digits
Also, \s
with u
flag will match any Unicode whitespace chars.