I'm trying to refactor some scala code to python3. Currently stuck at decoding a string in base64. The output from Python's base64.b64decode does not match the Scala's output.
Scala:
import org.apache.commons.codec.binary.Base64.decodeBase64
val coded_str = "UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA=="
decodeBase64(coded_str)
//Output 1 :
res1: Array[Byte] = Array(82, 2, -96, 15, 8, 104, 16, 0, 0, 52, 1, 0, -42, -42, 0, 0, 48, 1, 0, 26, 1, 0, 19, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, -6, -79, 62, 0, 59, 70, 0, 0)
coded_str.getBytes()
//Output 2
res2: Array[Byte] = Array(85, 103, 75, 103, 68, 119, 104, 111, 69, 65, 65, 65, 78, 65, 69, 65, 49, 116, 89, 65, 65, 68, 65, 66, 65, 66, 111, 66, 65, 66, 77, 65, 65, 65, 65, 65, 65, 81, 65, 65, 65, 65, 69, 65, 65, 81, 65, 67, 65, 65, 65, 65, 65, 65, 68, 54, 115, 84, 52, 65, 79, 48, 89, 65, 65, 65, 61, 61)
In Python, I tried:
import base64
coded_str = 'UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA=='
print (base64.b64decode(coded_str))
#Output 1 :
b'R\x02\xa0\x0f\x08h\x10\x00\x004\x01\x00\xd6\xd6\x00\x000\x01\x00\x1a\x01\x00\x13\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x00\x02\x00\x00\x00\x00\x00\xfa\xb1>\x00;F\x00\x00'
#Command 2:
b = [ord(s) for s in coded_str]
print (b)
#Output 2
[85, 103, 75, 103, 68, 119, 104, 111, 69, 65, 65, 65, 78, 65, 69, 65, 49, 116, 89, 65, 65, 68, 65, 66, 65, 66, 111, 66, 65, 66, 77, 65, 65, 65, 65, 65, 65, 81, 65, 65, 65, 65, 69, 65, 65, 81, 65, 67, 65, 65, 65, 65, 65, 65, 68, 54, 115, 84, 52, 65, 79, 48, 89, 65, 65, 65, 61, 61]
Trying to get the Output 1 from python to match Scala's.
Output 2 matches, but idk how to convert it from here.
Any help would be appreciated. Thanks!
Trying to get the same result in Python that I see in Scala.
Array(82, 2, -96, 15, 8, 104, 16, 0, 0, 52, 1, 0, -42, -42, 0, 0, 48, 1, 0, 26, 1, 0, 19, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, -6, -79, 62, 0, 59, 70, 0, 0)
CodePudding user response:
you get the same output ... its just bytes
import base64
coded_str = 'UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA=='
decoded_str = base64.b64decode(coded_str)
# you can get unsigned bytes by just using ord
bytes_ord = [ord(x) for x in decoded_str]
# but in java those look like signed bytes which take a tiny bit more effort...
import struct
bytes_match = struct.unpack(f"{len(decoded_str)}b",decoded_str)
print(bytes_match)
CodePudding user response:
No, it is the same.
This:
b'R\x02\xa0\x0f\x08h\x10\x00\x004\x01\x00\xd6\xd6\x00\x000\x01\x00\x1a\x01\x00\x13\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x00\x02\x00\x00\x00\x00\x00\xfa\xb1>\x00;F\x00\x00'
and this:
res1: Array[Byte] = Array(82, 2, -96, 15, 8, 104, 16, 0, 0, 52, 1, 0, -42, -42, 0, 0, 48, 1, 0, 26, 1, 0, 19, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, -6, -79, 62, 0, 59, 70, 0, 0)
Are in fact the exact same sequence.
82 is the ascii code for capital R. Hence, 82
in the scala side and the R
(first char in your python binary string) are both indicating: "A byte, whose value is 82".
Second byte is \x02
pythonside, and 2 scalaside. Same thing - character with unicode 2 is not printable so python makes that \x02
. It's the same byte.
And so on. -96 is the same as \xa0 = \xa0 is stating it in terms of unsigned hexadecimal, and -96 is stating the exact same bit sequence but printing it as two's complement signed binary. Undoing 2's complement (negate the bits, and add 1): 96 = 0110 0000. flip all bits then add 1: 1001 1111, add 1: 1010 0000. Which is 128 32 = 160, put that in hex terms: 160 goes into 16 exactly 'a' (10) times, so, \xa0.
That 70 at the end there is an 'F' in the python string because 70 is unicode for capital F, etc.
In general, don't attempt to print raw bytes like this, as it's just confusing.