Home > database >  Cannot print unicode chars even when setting PrintStrem to UTF-8
Cannot print unicode chars even when setting PrintStrem to UTF-8

Time:10-19

I'm currently having a problem with my logClass in which I can't print umlauts "üäß", I created this to print every string that would be on the console to a JTextPane.

I've set the PrintStream on the methode "Console" to use "UTF-8" and I've experimented with different fonts but I always end up with a error char when trying to print special chars. Would anyone have a suggestion on how to solve this problem?

Here is my code:

private JPanel logPanel;
private JScrollPane logScrollPanel;
private JTextPane textArea;
private Thread stdOutReader;
private Thread stdErrReader;
private boolean stopThreads;
private final PipedInputStream stdOutPin = new PipedInputStream();
private final PipedInputStream stdErrPin = new PipedInputStream();
private StyledDocument doc;
private Style style;

public void Console() {
    doc = (StyledDocument) textArea.getDocument();
    style = doc.addStyle("ConsoleStyle", null);
    StyleConstants.setFontFamily(style, "MonoSpaced");
    StyleConstants.setFontSize(style, 12);

    try {
        PipedOutputStream stdOutPos = new PipedOutputStream(this.stdOutPin);
        System.setOut(new PrintStream(stdOutPos, true, "UTF-8"));
    } catch (java.io.IOException io) {
        textArea.setText("Couldn't redirect STDOUT to this console\n"   io.getMessage());
    } catch (SecurityException se) {
        textArea.setText("Couldn't redirect STDOUT to this console\n"   se.getMessage());
    }

    try {
        PipedOutputStream stdErrPos = new PipedOutputStream(this.stdErrPin);
        System.setErr(new PrintStream(stdErrPos, true, "UTF-8"));
    } catch (java.io.IOException io) {
        textArea.setText("Couldn't redirect STDERR to this console\n"   io.getMessage());
    } catch (SecurityException se) {
        textArea.setText("Couldn't redirect STDERR to this console\n"   se.getMessage());
    }

    stopThreads = false; 
    stdOutReader = new Thread(this);
    stdOutReader.setDaemon(true);
    stdOutReader.start();

    stdErrReader = new Thread(this);
    stdErrReader.setDaemon(true);
    stdErrReader.start();
}

public synchronized void run() {
    try {
        while (Thread.currentThread() == stdOutReader) {
            try {
                this.wait(100);
            } catch (InterruptedException ie) {
            }
            if (stdOutPin.available() != 0) {
                String input = this.readLine(stdOutPin);
                StyleConstants.setForeground(style, Color.black);
                doc.insertString(doc.getLength(), input, style);
                textArea.setCaretPosition(textArea.getDocument().getLength());
            }
            if (stopThreads) {
                return;
            }
        }

        while (Thread.currentThread() == stdErrReader) {
            try {
                this.wait(100);
            } catch (InterruptedException ie) {
            }
            if (stdErrPin.available() != 0) {
                String input = this.readLine(stdErrPin);
                StyleConstants.setForeground(style, Color.red);
                doc.insertString(doc.getLength(), input, style);
                textArea.setCaretPosition(textArea.getDocument().getLength());
            }
            if (stopThreads) {
                return;
            }
        }
    } catch (Exception e) {
        textArea.setText("\nConsole reports an Internal error.");
        textArea.setText("The error is: "   e);
    }
}

private synchronized String readLine(PipedInputStream in) throws IOException {
    String input = "";
    do {
        int available = in.available();
        if (available == 0) {
            break;
        }
        byte b[] = new byte[available];
        in.read(b);
        input  = new String(b, 0, b.length);
    } while (!input.endsWith("\n") && !input.endsWith("\r\n") && !stopThreads);
    return input;
}

CodePudding user response:

System.setOut(new PrintStream(stdOutPos, true, "UTF-8"));

This is incorrect.

"System out" is not "the screen". It's an abstracted concept: It's the 'standard out' of your java application. And what's that? Well, you don't know. If your app was started with java -jar myapp.jar >somefile.txt, then standard out is an OutputStream that writes to somefile.txt. If it's >PRN on windows, it'll roll straight out of your printer. And so on.

Crucially, 'standard out', the abstraction, is fundamentally byte based, not char based. Hence, it is an OutputStream (or rather, a PrintStream which is just a bizarro outputstream). Not a Writer (writer is the char variant of outputstream).

So how does it work when you just run java -jar myapp.jar, without >stuff at the end?

Well, the shell you typed that command into will tell the OS to start the app such that standard out is hooked up either straight back to the shell that then processes it in some fashion, or hooks it to the 'console device' that it, itself, is also currently attached to.

From there, the bytes (not chars! Remember, it's byte based!) that the java app sends out, get processed by that shell or tty. Send the byte 65, and an A appears.

Crucially, the shell/tty is seeing bytes and has to translate them to characters. It does this the same way all things that convert bytes to chars or vice versa do it: With a charset encoding.

The question you have to answer is: What is that charset encoding? - THAT is the charset encoding you have to specify, not UTF-8. Because setting it to UTF_8 means the chars you write to the printstream get turned into bytes by way of UTF-8 encoding. If then the tty decodes the bytes back to chars using, say, CP-1252, everything except ASCII chars will just turn into mojibake.

Javas pre-java17 would default most byte-to-char-converters except the stuff in the java.nio.file.Files class to the 'platform native encoding', and usually the tty you write to is using platform native encoding. That means, until java17, you'd write: System.setOut(new PrintStream(stdOutPos, true)) and it would just work.

However, starting with Java 18, most everything defaults to UTF-8. Thus, you have to write this atrocious disaster:

Charset nativeCharset = Charset.forName(System.getProperty("native.encoding", Charset.defaultEncoding().name());
Scanner sc = new Scanner(System.in, nativeCharset);
PrintStream out = new PrintStream(System.out, true, nativeCharset);

Asking for the native.encoding property is the only way, but pre-java17 that system property doesn't exist. So we have to fall back to the default (the defaultEncoding() is always UTF-8 from java18 and up, but it's the native encoding in java17 and earlier).

If that still does not work, ask your tty to tell it what it is using for charset encoding, and then apply that in your java code too. How do you do that? Depends on what tty you are using. On linux it's generally just: type set, hit enter, and check the various environment vars, one of them will mention it.

To complicate matters, generally when you runs apps inside the IDE, they use their own charset encoding (generally UTF-8), even if the system encoding is something else. native.encoding does not, as far as I know, fix things.

The go-to strategy to solve all this business is to use the console instead (System.getConsole()), but if the aim is to make the app executable inside the IDE too, all IDEs... fail to implement it.

In essence, writing non-ASCII to a console from a java app is a disaster right now.

  •  Tags:  
  • java
  • Related