I have a condition where I have to replace some character(special, non-print-able and other special character) from string as mention below
private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";
stringValue.replaceAll(NON_ASCII_CHARACTERS, "").replaceAll(ASCII_CONTROL_CHARACTERS, "")
.replaceAll(NON_PRINTABLE_CHARACTERS, "");
can we refactor above code means we can use single "replaceAll" method and put all conditions inside?
is there any way please advice.
CodePudding user response:
You can use regex or operator |
private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";
public static String process(String stringValue) {
return stringValue.replaceAll(NON_ASCII_CHARACTERS "|" ASCII_CONTROL_CHARACTERS "|" NON_PRINTABLE_CHARACTERS, "");
}
public static void main(String[] args) {
String val = process("A9339a0zzz]3");
System.out.println(val);
}
CodePudding user response:
Code point
You might consider an alternate avenue, other than using regex. You can use the code point integer number for each character, and query Character
class for the category of character.
String input = … ;
String output =
input
.codePoints() // Returns an `IntStream` of code point `int` values.
.filter( codePoint -> ! Character.isISOControl( codePoint ) ) // Filter for the characters you want to keep. Those code points flunking the `Predicate` test will be omitted.
.collect( StringBuilder :: new , StringBuilder :: appendCodePoint , StringBuilder :: append ) // Convert the `int` code point integers back into characters.
.toString() ; // Make a `String` from the contents of the `StringBuilder`.
The Character
class has many of the classifications defined by the Unicode Consortium. You can use them to narrow down the stream of code points to those which represent your desired characters.
CodePudding user response:
According to the Pattern
javadocs, it should also be possible to combine the three character class patterns into a single character class:
private static final String NON_ASCII_CHARACTERS = "[^\\x00-\\x7F]";
private static final String ASCII_CONTROL_CHARACTERS = "[\\p{Cntrl}&&[^\r\n\t]]";
private static final String NON_PRINTABLE_CHARACTERS = "\\p{C}";
becomes
private static final String COMBINED =
"[[^\\x00-\\x7F][\\p{Cntrl}&&[^\r\n\t]]\\p{C}]";
or
private static final String COMBINED =
"[" NON_ASCII_CHARACTERS ASCII_CONTROL_CHARACTERS
NON_PRINTABLE_CHARACTERS "]";
Note that &&
(intersection) has lower precedence than the implicit union operator so all of the [
and ]
meta-characters in the above are required.
You decide which version you think is clearer. It is a matter of opinion.