I want to remove every extra spaces, signs, and lowercase ( in another words I want to simplify) the string with a function. The following function does this perfectly:
console.log(simplify(' The very optiMal! FUNCTION, {here] ...'));
function simplify(string) {
return string.toLowerCase().replace(/[^A-Za-z0-9'_] /g, " ").trim();
}
But the issue is I want to exclude an array of signs not to be removed from the string:
const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
So if the is any of the above signs in the string it should be intact and not be removed.
How would you do this?
CodePudding user response:
You can use
const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
const exclusion = signs.map(x => x.replace(/[-\\^$* ?.()|[\]{}]/g, '\\$&')).join("|");
const regex = new RegExp("(" exclusion ")|(?:(?!" exclusion ")[^A-Za-z0-9'_]) ", "g");
function simplify(string) {
return string.toLowerCase().replace(regex, (x,y) => y || " ").trim();
}
console.log(simplify(' The very optiMal! FUNCTION, {here] ...'));
Details:
(<exclusion>)
- Group 1 with exclusion patterns|
- or(?:(?!<exclusion>)[^A-Za-z0-9'_])
- a char other than an ASCII alphanumeric, underscore or'
chars, one or more but as many as possible occurrences, that does not start any of the exclusion patterns (since some of them are multi-character they cannot be simply included to the original negated character class).
The replacement is the Group 1 contents if Group 1 matches, else, the replacement is a space.
Another way to approach the issue - in case you want to always have a space separating each substring - is to use a reverse approach: match what you need and then join with a space:
const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
const exclusion = signs.map(x => x.replace(/[-\\^$* ?.()|[\]{}]/g, '\\$&')).join("|");
const regex = new RegExp(exclusion "|[A-Za-z0-9'_] ", "g");
function simplify(string) {
return string.toLowerCase().match(regex).join(" ");
}
console.log(simplify('this is...'));