Home > Enterprise >  simplify a string excluding an array
simplify a string excluding an array

Time:11-22

I want to remove every extra spaces, signs, and lowercase ( in another words I want to simplify) the string with a function. The following function does this perfectly:

console.log(simplify('   The     very optiMal! FUNCTION, {here] ...'));

function simplify(string) {
    return string.toLowerCase().replace(/[^A-Za-z0-9'_] /g, " ").trim();
}

But the issue is I want to exclude an array of signs not to be removed from the string:

const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];

So if the is any of the above signs in the string it should be intact and not be removed.

How would you do this?

CodePudding user response:

You can use

const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
const exclusion = signs.map(x => x.replace(/[-\\^$* ?.()|[\]{}]/g, '\\$&')).join("|");
const regex = new RegExp("("   exclusion   ")|(?:(?!"   exclusion  ")[^A-Za-z0-9'_]) ", "g");

function simplify(string) {
    return string.toLowerCase().replace(regex, (x,y) => y || " ").trim();
}

console.log(simplify('   The     very optiMal! FUNCTION, {here] ...'));

Details:

  • (<exclusion>) - Group 1 with exclusion patterns
  • | - or
  • (?:(?!<exclusion>)[^A-Za-z0-9'_]) - a char other than an ASCII alphanumeric, underscore or ' chars, one or more but as many as possible occurrences, that does not start any of the exclusion patterns (since some of them are multi-character they cannot be simply included to the original negated character class).

The replacement is the Group 1 contents if Group 1 matches, else, the replacement is a space.

Another way to approach the issue - in case you want to always have a space separating each substring - is to use a reverse approach: match what you need and then join with a space:

const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
const exclusion = signs.map(x => x.replace(/[-\\^$* ?.()|[\]{}]/g, '\\$&')).join("|");
const regex = new RegExp(exclusion   "|[A-Za-z0-9'_] ", "g");

function simplify(string) {
    return string.toLowerCase().match(regex).join(" ");
}

console.log(simplify('this is...'));

  • Related