Home > front end >  Regex pattern to match all words of a phrase while accepting quote (') in between but NOT in Fi
Regex pattern to match all words of a phrase while accepting quote (') in between but NOT in Fi

Time:02-01

Been looking for a couple of days now and still could not get my head around it.

This is the phrase,

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;

and this is the expected transformation,

['that's', 'the', 'password', 'password', '123', 'cried', 'the', 'special', 'agent', 'so', 'i', 'fled']

The first element (that's) of the array is the problem area.

I can only get below transformation,

['thats', 'the', 'password', 'password', '123', 'cried', 'the', 'special', 'agent', 'so', 'i', 'fled']

Using below code

const cleanPhrase = phrase.replace(/["':!,.]/g, '').replace(/[\n]/g, ' ').toLocaleLowerCase()
const words = cleanPhrase.split(' ');

Is there a way to ignore the single quotes on 'Password 123' but accept the single quote on that's ?

CodePudding user response:

I would first replace all quotes that are surrounded by letters with a "placeholder"... That is a character that should not appear in the string. I used a pipe (|) in the example below.

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;

const cleanPhrase = phrase

  // Replace all quotes with a placeholder
  .replace(/(\w)'(\w)/, "$1|$2")
  .replace(/["':!,.]/g, "")
  .replace(/[\n]/g, " ")

  // Restore the quotes where there is a placeholder
  .replace(/(\w)\|(\w)/, "$1'$2")
  .toLocaleLowerCase();
const words = cleanPhrase.split(" ");

console.log(words);

CodePudding user response:

First replace all the symbols with an empty string, then replace ', ' and \n with a single space:

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;
const words = phrase.replace(/["!.:,]/g, '')
  .replace(/\s\'|\'\s|\n/g, ' ')
  .toLocaleLowerCase().split(' ');
console.log(words);

You could also use split instead of the second replace:

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;
const words = phrase.toLocaleLowerCase()
  .replace(/["!.:,]/g, '')
  .split(/\s\'|\'\s|\n|\s/g);
  console.log(words);

CodePudding user response:

You can use a short solution like

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;
console.log(phrase.match(/\w (?:'\w )*/g).map(x=>x.toLowerCase()));

See the regex demo.

The /\w (?:'\w )*/g regex matches all occurrences (g flag stands for gloval) of one or more word chars followed with zero or more sequences of ' and one or more word chars.

CodePudding user response:

First i think it is better to use String.prototype.match() instead of split.
Then there is 2 simple methods for that:

A) Without using look-behind

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;
console.log(phrase.match(/(?!')[\w']*\w/g));

Live try

B) Using look-behind (Check browser compatibility)

const phrase = `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`;
console.log(phrase.match(/(?!')[\w'] (?<!')/g));

Explaination
  • \w = [a-zA-Z0-9_]
  • [\w'] a character set/class '
    • * Zero or more length (Of the set)
    • One or more length (Of the set)
  • (?!') Check if in first of your ahead is not a '
  • (?<!') Check if in last of your behind is not a '

Note: In first method [\w']* can be zero or more so for checking ahead of that, i use a char length class (\w) without the quote ' to i can avoid of using negative look-behind and also support even one character words like I

  •  Tags:  
  • Related