Home > OS >  What should I add to a regular expression to remove punctuation marks that appear more than 1 time?
What should I add to a regular expression to remove punctuation marks that appear more than 1 time?

Time:09-26

I am trying to write a regular expression that will remove letters of another language and punctuation marks that occur more than 1 time.

To remove the letters from another language here is the usual expression:

st = test.replaceAll("[^ a-zA-z0-9]" ,  "");

But i don't understand what should i add to it so that it removes not all punctuation marks and spaces ,but only those that occur more than 1 time: String test = new String("agagahh,,,mvf .... AJFKL ???");

I will be glad to help

Input : "agagahh,,,mvf .... AJFKL ???"

Output:"agagahh,mvf . AJFKL ?"

CodePudding user response:

You can first remove all characters that are not alphanumeric or one of the accepted punctuation marks. Then, you can use a capturing group to match a punctuation mark followed by at one or more of the same punctuation mark, to be replaced by a single punctuation mark.

String str = "agagahh,,,mvf ....      AJFKL  ???";
String res = str.replaceAll("[^ a-zA-z0-9.?,]", "").replaceAll("([ .,?])\\1 ", "$1");
System.out.println(res);
  •  Tags:  
  • java
  • Related