Home > Software design >  Merging 2 regex that allow only English and Arabic characters
Merging 2 regex that allow only English and Arabic characters

Time:08-14

I have a string and I want to remove any other character such as (0..9!@#$%^&*()_., ...) and keep only alphabetic characters.

After looking up and doing some tests, I got 2 regexes formats:

String str = "123hello!#$% مرحبا. ok";
str = str.replaceAll("[^a-zA-Z]", "");
str = str.replaceAll("\\P{InArabic} ", "");
System.out.println(str);

This should return "hello مرحبا ok".

But of course, this will return an empty string because we're removing any non-Latin characters in the first regex then we remove any non-Arabic characters in the second regex.

My question is, how can I merge these 2 regexes in one to keep only Arabic and English characters only.

CodePudding user response:

Use lowercase p since negation is handled with ^ and no quantifier is needed (but wouldn't hurt) since using replaceAll:

String str = "123hello!#$% مرحبا. ok";
str = str.replaceAll("[^a-zA-Z \\p{InArabic}]", "");
System.out.println(str);

Prints:

hello مرحبا ok

Note based on your expected results you want spaces included so a space is in the character list.

  • Related