Java Spring Boot replace('\u00A0',' ') Don't work on my tomcat server but-CodePudding

I have one application Java spring-bot 2.7.2 who working fine on localHost (jar file) , which opens the file in removes the character special , but in my tomcat9 (war) with the same JDK11 I don't remove all my character.

try(Scanner scanner = new Scanner(is);
                BufferedWriter writer = new BufferedWriter(new FileWriter(file))) {
                String line;
                while(scanner.hasNextLine()){
                    line = scanner.nextLine();
                    writer.append(line.replace("<br>","")
                            .replace('\u00A0',' ')
                            .replace("> ",">")
                    );
                    writer.newLine();
                }
            }

In Notepad have xC2 char identify (U 00A0 : NO-BREAK SPACE [NBSP])

why do i have such a difference ?

CodePudding user response：

I recommend using Java Springboot version 2.5.7. Because you are running JDK 17 on it and not JAR file! It does this automatically! 2.5.7 is much more convenient!

CodePudding user response：

The bytes C2 80 is the UTF-8 encoding for the non-breaking space. So probably your server uses Unicode in the multi-byte form of UTF-8.

FileWriter and FileReader are old utility classes that use the default platform encoding for writing text. That is non-portable.

Path path = file.toPath();
Charset charset = Charset.forName("Windows-1251"); // Russian for instance.
try (BufferedReader reader = new BufferedReader(
            new InputStreamReader(is), StandardCharsets.UTF_8));
        BufferedWriter writer = Files.newBufferedWriter(path, charset)) {
    String line;
    while ((line = reader.readLine()) != null) {
        writer.append(line.replace("<br>","")
                        .replace('\u00A0',' ')
                        .replace("> ",">")
        );
        writer.newLine();
    }
}

Best would be to have the InputStream in full Unicode, i.e. UTF-8. The output file is best given some explicit encoding (Charset).

On Windows you might let it recognize UTF-8 by an invisible zero-width space, the BOM ("byte order marker"):

Charset charset = StandardCharsets.UTF_8;
...
    writer.write("\uFEFF"); // Unicode BOM marker
    String line;