How to get uppercase words in sentence using javascript-CodePudding

I am trying to get this

Input text: "Thai, or Central Thai, is a Tai language of the Kra–Dai language family spoken by the Central Thai people and a vast majority of Thai Chinese. It is the sole official language of Thailand. I want to... "

The expect ouput: ["Thai", "Central Thai", "Kra–Dai", "Thai Chinese", "Thailand"]

Then by uing the Wikipedia API I will get the definitions of the words above. I am using this regular expression:

[A-Z][-a-zA-Z]*(?:\s [A-Z][-a-zA-Z]*)?

However when I try the result is:

["Thai", "Central Thai", "Kra", "Dai", "Thai Chinese", "Thailand", "I", "It"]

It is separating the words with that contains "-" and including the ones that start with upper after a dot "." and also is including "I" and "It".

How could I get all uppercase words except the uppercase word after "."

CodePudding user response：

We can use word boundaries \b.

let str = 'Thai, or Central Thai, is a Tai language of the Kra-Dai language family spoken by the Central Thai people and a vast majority of Thai Chinese. It is the sole official language of Thailand. I want to...';
let arr =[...str.matchAll( /\b[A-Z]\w{2,}-?(\s?\b[A-Z]\w*)?/g)].map(e=>e[0]);
console.log(arr);

How could I get all uppercase words except the uppercase word after "."

But you can get special words at the start of a sentence too:
Periplectic group consists of the group's last common ancestor and all its descendants

CodePudding user response：

This worked for me (?!.\s)[A-Z][a-z] (?:\s[A-Z][a-z] |[–] [A-Z][a-z]*|[a-z] )