Home > Back-end >  Java Spring Boot replace('\u00A0',' ') Don't work on my tomcat server but
Java Spring Boot replace('\u00A0',' ') Don't work on my tomcat server but

Time:08-24

I have one application Java spring-bot 2.7.2 who working fine on localHost (jar file) , which opens the file in removes the character special , but in my tomcat9 (war) with the same JDK11 I don't remove all my character.

try(Scanner scanner = new Scanner(is);
                BufferedWriter writer = new BufferedWriter(new FileWriter(file))) {
                String line;
                while(scanner.hasNextLine()){
                    line = scanner.nextLine();
                    writer.append(line.replace("<br>","")
                            .replace('\u00A0',' ')
                            .replace("> ",">")
                    );
                    writer.newLine();
                }
            }

In Notepad have xC2 char identify (U 00A0 : NO-BREAK SPACE [NBSP])

why do i have such a difference ?

CodePudding user response:

I recommend using Java Springboot version 2.5.7. Because you are running JDK 17 on it and not JAR file! It does this automatically! 2.5.7 is much more convenient!

CodePudding user response:

The bytes C2 80 is the UTF-8 encoding for the non-breaking space. So probably your server uses Unicode in the multi-byte form of UTF-8.

FileWriter and FileReader are old utility classes that use the default platform encoding for writing text. That is non-portable.

Path path = file.toPath();
Charset charset = Charset.forName("Windows-1251"); // Russian for instance.
try (BufferedReader reader = new BufferedReader(
            new InputStreamReader(is), StandardCharsets.UTF_8));
        BufferedWriter writer = Files.newBufferedWriter(path, charset)) {
    String line;
    while ((line = reader.readLine()) != null) {
        writer.append(line.replace("<br>","")
                        .replace('\u00A0',' ')
                        .replace("> ",">")
        );
        writer.newLine();
    }
}

Best would be to have the InputStream in full Unicode, i.e. UTF-8. The output file is best given some explicit encoding (Charset).

On Windows you might let it recognize UTF-8 by an invisible zero-width space, the BOM ("byte order marker"):

Charset charset = StandardCharsets.UTF_8;
...
    writer.write("\uFEFF"); // Unicode BOM marker
    String line;
  •  Tags:  
  • java
  • Related