Home > Back-end >  Encoded using Base 64 using UTF-8 for German umlauts( ä, ö, ü) in java, will atob() work or need any
Encoded using Base 64 using UTF-8 for German umlauts( ä, ö, ü) in java, will atob() work or need any

Time:07-24

Reading the ä, ö, ü characters from message*.properties file and using below code to send to client

String encodedString = Base64.getEncoder().encodeToString(str.getBytes(StandardCharsets.UTF_8));

Client is reading the data using atob method. (Angular application)

const decodedString = atob(encodedString)

Will this work or need to do any special handling at client side, i.e. atob method?

Also does atob() mehtod use any default characterset?

please advise.

CodePudding user response:

See the section 'unicode strings' in the MDN docs on btoa which clearly states: No, that does not work. atob and btoa are essentially broken as designed and should never be used except for utterly unlikely exotic scenarios (such as: You've BASE64ed some ASCII which is rather rare).

You probably want b64_to_utf8 and utf8_to_b64 instead.

Separately, java's own properties file format is not neccessarily entirely clear on encoding either. It's quite the chain - any error anywhere in it causes problems!

  1. You create that .properties file with something, perhaps a text editor. Text editors edit text, files are bytes - so somebody is doing some charset conversion. Ensure editor is using UTF-8.
  2. In java, you usually read properties file using the rather broken java.util.Properties class. It's 30 years old and has all sorts of warts. In particular, if you use the Properties.load(InputStream) variant, which is the commonly used one, e.g:
try (var in = new FileInputStream("my.properties")) {
  properties.load(in);
}

you broke it - that will read as ISO8859. You must first convert your InputStream to a reader:

try (var in = new FileInputStream("my.properties")) {
  properties.load(new InputStreamReader(in, StandardCharsets.UTF_8));
}
  1. You then fetch the bytes and Base64 encode them. You have to use .getBytes(UTF_8) or you break it unless the platform default encoding is UTF-8. Fortunately, you're already doing that.

  2. You then send the Base64 bytes across the pipe. Nobody is applying any encodings here (or, rather, it'll convert back and forth loads of times, but given that base64 is all 'safe ascii', this is highly unlikely to cause any issues).

  3. The base64 arrives in your javascript code. If you use btoa, you break it. You have to use b64_to_utf8. Finally you've "arrived" - a string with the right characters in your javascript environment.

  • Related