Home > Software design >  Why should I add a space at the end of a regular expression JavaScript
Why should I add a space at the end of a regular expression JavaScript

Time:04-20

const sentence = '%I $am@% a %tea@cher%, &and& I lo%#ve %te@a@ching%;. The@re $is no@th@ing; &as& mo@re rewarding as educa@ting &and& @emp%o@weri@ng peo@ple. ;I found tea@ching m%o@re interesting tha@n any ot#her %jo@bs. %Do@es thi%s mo@tiv#ate yo@u to be a tea@cher!? %Th#is 30#Days&OfJavaScript &is al@so $the $resu@lt of &love& of tea&ching';

To clean out the special characters from the sentence, I tried using regular expression like below.

let pattern = sentence.replace(/[^\w ]/g, '');
console.log(pattern);

What I don't understand is that when I remove the space after \w, the result text has no spaces at all.

IamateacherandIloveteachingThereisnothingasmorerewardingaseducatingandempoweringpeopleIfoundteachingmoreinterestingthananyotherjobsDoesthismotivateyoutobeateacherThis30DaysOfJavaScriptisalsotheresultofloveofteaching

I have no idea why the spaces are all removed... I though with a space after \w, the space in the text would also be replaced, but it didn't. It is a common syntax to add a space after a regular expression? I want to know why there should be a space after \w.

CodePudding user response:

You are using a negated or complemented character class. It means replace all characters that are not (^ means not in this context) word characters or spaces with the empty string.

Here is an explanation of groups and ranges for regular expressions. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges

CodePudding user response:

In regex, [^...] means none in the square brackets, so [^\w ] with space means no \w and no space, in your case it will only match these special characters. If you delete the space then [^\w] means no \w which will match % @ and spaces. That's common for all regex implementations.

Have a look at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet and find the [^xyz] part.

CodePudding user response:

/w matches a word character, equivalent to: [a-zA-Z0-9_]

The ^ means not. So you will remove everything except the character classes listed in your [].

  1. [^/w ] Find everything that is neither a word character, nor a space and remove it.
  2. [^/w] Find everything that is not a word character and remove it.

In option 1, word characters and spaces will remain. In option 2, only word characters will remain.

CodePudding user response:

In regex, [\w] matches any word character (equivalent to [a-zA-Z0-9_])

[^\w] matches every characters excluding [a-zA-Z0-9_] that means all special characters( excluding _ ) and space is to be replaced by empty string.

Now, since you want to retain the space in your pattern so that it does not get replaced by an empty string you have to add space at the end of the regex.

  • Related