Hello everyone my code converts Latin alphabet characters to binary but crashes when i try converting non-Latin alphabet characters. Can you help me so my code can convert every alphabet?
fun strToBinary(str: String): String {
val builder = StringBuilder()
for (c in str.toCharArray()) {
val toString = c.code.toString(2) // get char value in binary
builder.append(String.format("d", Integer.parseInt(toString))) // we complete to have 8 digits
}
return builder.toString()
}
When i try non-Latin characters it gives this exception.
Exception in thread "main" java.lang.NumberFormatException: For input string: "11000100011"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at BinaryKt.strToBinary(Binary.kt:6)
at BinaryKt.main(Binary.kt:41)
at BinaryKt.main(Binary.kt)
CodePudding user response:
You wrongly assumed that a character always takes 1 byte. This is only true when using ASCII, but less common characters may take as much as even 4 bytes per char.
I suggest first encoding the whole string into ByteArray
using UTF-8
and then converting it byte by byte:
fun strToBinary(str: String) = buildString {
str.toByteArray().forEach {
append(it.toUByte()
.toString(2)
.padStart(8, '0')
)
}
}
Note that such encoding takes quite a log of space. Resulting string is at least 8 times longer than the original text. You can use hex encoding to make it 4 times shorter:
fun strToHex(str: String) = buildString {
str.toByteArray().forEach {
append(it.toUByte()
.toString(16)
.padStart(2, '0')
)
}
}
Or make it even shorter using base64
encoding:
fun strToBase64(str: String) = Base64.getEncoder().encodeToString(str.toByteArray())
Update
To decode the string we basically need to reverse all steps. For example, for decoding from binary we need to chunk the string into 8-chars parts, decode each of them into a single byte, create byte array and then decode into string using UTF-8
:
fun binaryToStr(binary: String) =
binary.chunked(8)
.map { it.toUByte(2).toByte() }
.toByteArray()
.decodeToString()