I am using a fastify server, containing a typescript file that calls a function, which make sure people won't send unwanted characters. Here is the function :
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Latin}\p{Zs}\p{M}\p{Nd}\-\'\s]/gu;
function secure(text:string) {
return text.replace(SAFE_STRING_REPLACE_REGEXP, "").trim();
}
But when I try to launch my server, I got an error message : "Invalid regular expression - Invalid property name in character class".
It used to work just fine with my previous regex :
const SAFE_STRING_REPLACE_REGEXP = /[^0-9a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð\-\s\']/g;
function secure(text:string) {
return text.replace(SAFE_STRING_REPLACE_REGEXP, "").trim();
}
But I have been told it wasn't optimized enough. I have also been told it's better to use split/join than regex/replace in matter of performances, but I don't know if I can use it in my case.
CodePudding user response:
You need to use
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Script=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/gu;
// or
const SAFE_STRING_REPLACE_REGEXP = /[^\p{sc=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/gu;
You need to prefix scripts with sc=
or Script=
in Unicode category classes, so \p{Latin}
should be specified as \p{Script=Latin}
. See the ECMAScript reference.
Also, when you use the u
flag, you cannot escape non-special chars, so do not escape '
and better move the -
char to the end of the character class.
You can use split&join, too:
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Script=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/u;
console.log("Ącki-Łał русский!!!中国".split(SAFE_STRING_REPLACE_REGEXP).join(""))
Note you don't need the g
modifier with split
, it is the default behavior.