Home > Back-end >  replacing all cases of ISO Control characters in a string with "CTRL"
replacing all cases of ISO Control characters in a string with "CTRL"

Time:08-02

 static String clean(String identifier) {
    String firstString = "";
    for (int i = 0; i < identifier.length(); i  )
        if (Character.isISOControl(identifier.charAt(i))){
            firstString = identifier.replaceAll(identifier.charAt(i), 
                          "CTRL");
         }
            
        return firstString;
}

The logic behind the code above is to replace all instances of ISO Control characters in the string 'identifier' with "CTRL". I'm however faced with this error: "char cannot be converted to java.lang.String"

Can someone help me to solve and improve my code to produce the right output?

CodePudding user response:

String#replaceAll expects a String as parameter, but it has to be a regular expression. Use String#replace instead.

EDIT: I haven't seen that you want to replace a character by some string. In that case, you can use this version of String#replace but you need to convert the character to a String, e. g. by using Character.toString.

Update

Example:

String text = "AB\003DE";
text = text.replace(Character.toString('\003'), "CTRL");
System.out.println(text);
// gives: ABCTRLDE

CodePudding user response:

Code points, and Control Picture characters

I can add two points:

  • The char type is essentially broken since Java 2, and legacy since Java 5. Best to use code point integers when working with individual characters.
  • Unicode defines characters for display as placeholders for control characters. See Control Pictures section of one Wikipedia page, and see another page, Control Pictures.

For example, the NULL character at code point 0 decimal has a matching SYMBOL FOR NULL character at 9,216 decimal: . To see all the Control Picture characters, use this PDF section of the Unicode standard specification.

Get an array of the code point integers representing each of the characters in your string.

int[] codePoints = myString.codePoints().toArray() ; 

Loop those code points. Replace those of interest.

Here is some untested code.

int[] replacedCodePoints = new int[ codePoints.length ] ;
int index = 0 ;
for ( int codePoint : codePoints )
{
    if( codePoint >= 0 && codePoint <= 32 ) // 32 is SPACE, so you may want to use 31 depending on your context.
    {
        replacedCodePoints[ index ] = codePoint   9_216 ;  // 9,216 is the offset to the beginning of the Control Picture character range defined in Unicode.
    } else if ( codePoint == 127 )  // DEL character.
    {
        replacedCodePoints[ index ] = 9_249 ;
    } else  // Any other character, we keep as-is, no replacement.
    {
        replacedCodePoints[ index ] = codePoint ;
    }
    i    ;  // Set up the next loop.
}

Convert code points back into text. Use StringBuilder#appendCodePoint to build up the characters of text. You can use the following stream-based code as boilerplate. For explanation, see this Question.

String result = 
    Arrays
        .stream( replacedCodePoints )
        .collect( StringBuilder::new , StringBuilder::appendCodePoint , StringBuilder::append )
        .toString();
  •  Tags:  
  • java
  • Related