I am trying to validate for filename or directory portions of a string that will eventually be used in a URL and want to reject non Unicode plus other characters, the regex is returning null bytes.
Given this string as input:
զվարճ?անք9879#jhkjhkhl!kjljlkjlkjj() ======\_ew.html
/(?![\p{L}]|[\p{N}]|[\._-~])/gu
JavaScript returns correct invalid character matches, but is selecting a null byte for every character matched and not the full character.
If I run the opposite and try to match on characters that are ok instead of not ok:
/[\p{L}]|[\p{N}]|[\._-~]/gu
JavaScript returns matches and selects each valid character as expected, no null byte matches.
Each pattern has the /u
flag. I don't understand the difference in behavior. Tested this in the latest Chrome (update 100 as of post date), Safari, and Firefox and they all behave the same.
Is there some flag or operator that the first regex is missing or is this a JavaScript bug / limitation?
CodePudding user response:
You are not matching, only asserting. You can either match a single character right after the assertion and bundle the alternation to a single character class:
(?![\p{L}\p{N}._-~]).
Or you can match 1 or more times the opposite using a negated character class starting with [^
[^\p{L}\p{N}._-~]
Note that this part in the character class _-~
denotes a range instead of chars _
-
~
If you want to match the -
char, you can either escape it or place it at the start or end of the character class.