Home > Software engineering >  Match non-English words using Regex in JavaScript
Match non-English words using Regex in JavaScript

Time:08-11

Here is an example sentence:

क्या आप क्लोज़अप करते हैं 

I want to extract the first word क्या from this sentence using Regex. I can do so in English by using (^\w ) but that doesn't work with other alphabets.

How should I proceed?

CodePudding user response:

You need to add the u flag for Unicode support:

const str = 'क्या आप क्लोज़अप करते हैं ';

console.log('Letters and punctuation marks: '   str.match(/^[\p{L}\p{M}] /u))
console.log('Anything but space: '   str.match(/^[^\p{Zs}] /u))

Result:

Letters and punctuation marks: क्या
Anything but space: क्या

Explanation:

  • both regex use ^ to anchor at the beginning
  • regex 1: [\p{L}\p{M}] - one or more letters and punctuation marks
  • regex 2: [^\p{Zs}] - anything that is not a space (includes all Unicode spaces)
  • the u flag enables Unicode so that you can use \p{...} Unicode patterns

See details at https://javascript.info/regexp-unicode

CodePudding user response:

You can use this regex to extract first word

^[\pL] 

CodePudding user response:

Try following regex

[^\x00-\x7F] 
  • Related