I have an classic Java application for PC. The result of the build is a JAR file which is running on Windows machine.
The application is reading some XML files and creating an HTML document as an end result. The Xml file contains specific language characters that are not native to English.
While in development, in the IDE (Apache NetBeans 13), build - > Run the exported HTML file contains specific language characters.
When I run the JAR file, from the Project - > dist directory , HTML do not contain specific language characters.
For example characters like: č , ć , đ, š are being exported as : Ä� , while running from NetBeans they are exported as such, not as that strange symbol. The letters in question are from Serbian, Croatian and Bosnian.
When I export the project from NetBeans, I made sure to have this option enabled: Project -> Project properties -> Build -> Packaging where the "Copy Dependent Libraries" option is selected.
I am puzzled at this point. If anybody has any idea why something is working one way in IDE and other when exported please let me know.
CodePudding user response:
The likely problem is that your HTML file needs to identify its
If switching to a particular encoding causes the text to appear as expected, then you know the likely encoding.
Specify the file’s encoding
In the modern version of HTML, HTML5, UTF-8 is the default encoding assumed by the web browser. But if the web browser switches into Quirks Mode, the browser may assume another encoding. To help avoid Quirks Mode, a HTML5 document should start with <!DOCTYPE html>
.
So, best to be explicit about the encoding. Once you determine the encoding being used by your Java app creating the HTML file, either alter that app (if you have source code) to write an indicator of the encoding, or else write another Java app to edit the produced HTML file to include the indicator. If you are not a Java developer, you could use any programming language or even a shell script to edit the produced HTML file.
To indicate the encoding of an HTML5 file, add a meta
element.
For UTF-8:
<meta charset="UTF-8">
For Latin-1:
<meta charset="ISO-8859-1">
If your Java app was developed exclusively on Microsoft Windows, the developer may have knowingly or unwittingly used one of the Microsoft defined character encodings. Older versions of Java defaulted to using a character encoding specific to the host platform — but be aware in Java 18 the default changes to UTF-8 across platforms.
For more info
You can read about these issues in many places. Like here and in Wikipedia.
If you are not savvy with character sets and character encoding, I highly recommend reading the surprisingly entertaining article, The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), by Joel Spolsky.