Decoding Base64 String in Java-CodePudding

I'm using Java and I have a Base64 encoded string that I wish to decode and then do some operations to transform.

The correct decoded value is obtained in JavaScript through function atob(), but in java, using Base64.decodeBase64() I cannot get an equal value.

Example:

For:

String str = "AAAAAAAAAAAAAAAAAAAAAMaR ySCU0Yzq AV9pNCCOI="

With JavaScript atob(str) I get ->

"Æ‘û$‚SF3«àö“Bâ"

With Java new String(Base64.decodeBase64(str)) I get ->

"Æ?û$?SF3«à§ö?â"

Another way I could fixed the issue is to run JavaScript in Java with a Nashorn engine, but I'm getting an error near the "$" symbol.

Current Code:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("JavaScript");
String script2 = "function decoMemo(memoStr){ print(atob(memoStr).split('')"   
    ".map((aChar) => `0${aChar.charCodeAt(0).toString(16)}`"  
    ".slice(-2)).join('').toUpperCase());}";
try {
    engine.eval(script2);
    Invocable inv = (Invocable) engine;
    String returnValue = (String)inv.invokeFunction("decoMemo", memoTest );
    System.out.print("\n result: "   returnValue);
} catch (ScriptException | NoSuchMethodException e1) {
    e1.printStackTrace();

Any help would be appreciated. I search a lot of places but can't find the correct answer.

CodePudding user response：

btoa is broken and shouldn't be used.

The problem is, bytes aren't characters. Base64 encoding does only one thing. It converts bytes to a stream of characters that survive just about any text-based transport mechanism. And Base64 decoding does that one thing in reverse, it converts such characters into bytes.

And the confusion is, you're printing those bytes as if they are characters. They are not.

You end up with the exact same bytes, but javascript and java disagree on how you're supposed to turn that into an ersatz string because you're trying to print it to a console. That's a mistake - bytes aren't characters. Thus, some sort of charset encoding is being used, and you don't want any of this, because these characters clearly aren't intended to be printed like that.

Javascript sort of half-equates characters and bytes and will freely convert one to the other, picking some random encoding. Oof. Javascript sucks in this regard, it is what it is. The MDN docs on btoa explains why you shouldn't use it. You're running into that problem.

Not entirely sure how you fix it in javascript - but perhaps you don't need it. Java is decoding the bytes perfectly well, as is javascript, but javascript then turns those bytes into characters into some silly fashion and that's causing the problem.

CodePudding user response：

What you have there is not a text string at all. The giveaway is the AA's at the beginning. Those map to a number of zero bytes. That doesn't translate to meaningful text in any standard character set.

So what you have there is most likely binary data. Converting it to a string is not going to give you meaningful text.

Now to explain the difference you are seeing between Java and Javascript. It looks to me as if both Java and Javascript are making a "best effort" attempt to convert the binary data as if is was encoded in ISO-8859-1 (aka ISO LATIN-1).

The problem is some of the bytes codes are mapping to unassigned codes.

In the Java case those unassigned codes are being mapped to ?, either when the string is created or when it is being output.
In the Javascript case, either the unassigned codes are not included in the string, or them are being removed when you attempt to display them.

For the record, this is how an online base64 decoder the above for me:

  ����������������Æû$SF3«àöBâ

The unassigned codes are 0x91 0x82 and 0x93. 0x15 and 0x0B are non-printing control codes.

But the bottom line is that you should not be converting this data into a string in either Java or in Javascript. It should be treated as binary; i.e. an array of byte values.

CodePudding user response：

byte[] data = Base64.getDecoder().decode(str);