Home > Software engineering >  Java Regex to match Chinese and/or ordinary numbers
Java Regex to match Chinese and/or ordinary numbers

Time:07-20

Actually the regex I have matches anything but the Chinese but it matches the numbers too, which I don't want. As you can see in the regex demo here, the number 45 is matched but I need it to be excluded too.

https://regex101.com/r/XNtD12/1

Current regex is: (?!\p{IsHan}\n)[^\p{IsHan}\n?。,?!]

Desired output:

He is 45 today <- matched 100%
你今天45岁了 <- not matched at all
这个句子没有数字 <- not matched at all
Ok I see <- matched 100%

Java code being used:

String example = "He is 45 today\n你今天45岁了\n这个句子没有数字\nOk I see";
System.out.println(example.replaceAll("^[^\\p{IsHan}\\n?。,?!] $", ""));

CodePudding user response:

In your pattern you can omit the lookahead (?!\p{IsHan}\n) as the directly following negated character class already does not match \p{IsHan}

If you don't want partial matches, you can add anchors to the start and the end of the pattern, and enable multiline using an inline modifier (?m)

String example = "He is 45 today\n你今天45岁了\n这个句子没有数字\nOk I see";
System.out.println(example.replaceAll("(?m)^[^\\p{IsHan}\\n?。,?!] $", ""));

See a regex demo and a Java demo

If you want to remove optional trailing newlines using replaceAll:

^[^\\p{IsHan}\\n?。,?!] $\\R?
  • Related