Can I make my regex split the punctuation marks from my special words?-CodePudding

I have the following string:

"By signing in, I agree to the {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}."

And I am using the following regex to split the words while considering {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}} as whole words.

\s (?![^\[]*\])

My problem is that my current regex does not remove the full stop at the end of {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}.. Ideally I would like my regex to split full stops, exclamation marks and question marks. That being said, I'm not sure how would I differentiate between a full stop at the end of the word and a full stop that is part of the URL.

CodePudding user response：

You can try a variation of the following regular expression:

\s (?![^\[]*\])|(?=[\.?!](?![a-zA-Z0-9_%-]))

The new part being the alternation of (?=[\.?!](?![a-zA-Z0-9_%-])) at the end. It performs a positive lookahead of a period, question mark or bang, using a negative lookahead to make sure it's not followed by a URL-ish looking character. You may need to adjust that character class in brackets to contain the characters you want to consider part of the URL.

CodePudding user response：

Instead of .split you will be better off using .match here using this regex:

\{\{#a}}.*?\{\{\/a}}/g

This matches {{#a}} followed by 0 or of any character followed by {{/a}}.

or else you may use this more strict regex match:

\{\{#a}}\[[^\]]*]\([^)]*\)\{\{\/a}}

Here:

\[[^\]]*]: Matches [...] substring
\([^)]*\): Matches (...) substring

RegEx Demo

var string = "By signing in, I agree to the {{#a}}[Terms of Use](https://www.example.com/termsofuse){{/a}} and {{#a}}[Privacy Policy](https://www.example.com/privacy){{/a}}.";
console.log( string.match(/\{\{#a}}.*?\{\{\/a}}/g) );