Home > Net >  Finding end of sentence and add double space but account for abbreviations ending in a period
Finding end of sentence and add double space but account for abbreviations ending in a period

Time:07-29

I have a block of code that is working well and adding in an appropriate double space at the end of a sentence. Where I am running into trouble is forcing the regex to overlook abbreviations such as "U.S."

let report = String(item.Report);
let reportDoubleSpace = report.replace(/[!?.] (?=$|\s)/g, ". ");

enter image description here

I've tried to add in [^U.S.] but it then will remove the last 2 characters of each sentence.

let reportDoubleSpace = report.replace(/[^U.S.][!?.] (?=$|\s)/g, ". ");

enter image description here

I am not sure how to fix this so that it won't grab the last 2 characters and still make an exception for U.S.

CodePudding user response:

There are a couple issues here.

First of all, you shouldn't be using square brackets if you're wanting to match a specific string. The purpose of square brackets is to tell the engine to match any single character within the square brackets, not the entire thing. This combined with the ^ at the beginning, tells it to match anything that isn't in the square brackets, so in this case any character that is not a U, S, or . that comes before a period will be matched. Instead, you should use parentheses. This denotes a group which will make it look for the entire string, not just each single character within it. I will talk about the use of the ^ character in a moment.

Second, now that you're using parentheses rather than square brackets, the . characters are no longer literal. This means that instead of representing a period, it now represents any single character. This can be fixed by escaping them by adding a \ right in front of them.

After both of these changes, the expression should now look like this:

(^U\.S\.)[!?.] (?=$|\s)

Third, let's talk about the ^ character. Outside of a square bracket group, this character denotes the beginning of a line, which is not what we want. Instead, you'll want to replace it with a negative lookbehind. This is just a group beginning with ?<!. What this will do is tell the engine to only match if the match is not preceded by the string in the group.

After this change it should look like this:

(?<!U\.S\.)[!?.] (?=$|\s)

And finally, since you're wanting to match the period after the lookbehind, we need to remove the last period from U.S. in order for the lookbehind to work properly, since it's looking for what's before the match to determine if it should actually match it or not.

The final expression should look like this:

(?<!U\.S)[!?.] (?=$|\s)
  • Related