I have one application Java spring-bot 2.7.2 who working fine on localHost (jar file) , which opens the file in removes the character special , but in my tomcat9 (war) with the same JDK11 I don't remove all my character.
try(Scanner scanner = new Scanner(is);
BufferedWriter writer = new BufferedWriter(new FileWriter(file))) {
String line;
while(scanner.hasNextLine()){
line = scanner.nextLine();
writer.append(line.replace("<br>","")
.replace('\u00A0',' ')
.replace("> ",">")
);
writer.newLine();
}
}
In Notepad have xC2
char identify (U 00A0 : NO-BREAK SPACE [NBSP])
why do i have such a difference ?
CodePudding user response:
I recommend using Java Springboot version 2.5.7. Because you are running JDK 17 on it and not JAR file! It does this automatically! 2.5.7 is much more convenient!
CodePudding user response:
The bytes C2 80 is the UTF-8 encoding for the non-breaking space. So probably your server uses Unicode in the multi-byte form of UTF-8.
FileWriter and FileReader are old utility classes that use the default platform encoding for writing text. That is non-portable.
Path path = file.toPath();
Charset charset = Charset.forName("Windows-1251"); // Russian for instance.
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(is), StandardCharsets.UTF_8));
BufferedWriter writer = Files.newBufferedWriter(path, charset)) {
String line;
while ((line = reader.readLine()) != null) {
writer.append(line.replace("<br>","")
.replace('\u00A0',' ')
.replace("> ",">")
);
writer.newLine();
}
}
Best would be to have the InputStream in full Unicode, i.e. UTF-8. The output file is best given some explicit encoding (Charset).
On Windows you might let it recognize UTF-8 by an invisible zero-width space, the BOM ("byte order marker"):
Charset charset = StandardCharsets.UTF_8;
...
writer.write("\uFEFF"); // Unicode BOM marker
String line;