How to convert that bytes type array to str or json? I have this python byte-code and I need to convert to json format or string format. How can I do that?
b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'
CodePudding user response:
This looks like random binary data, not encoded text, so one way of storing binary data in JSON is to use base64 encoding. The base64 algorithm ensures all the data elements are printable ASCII characters, but the result is still a bytes
object, so .decode('ascii')
is used to convert the ASCII bytes to a Unicode str
of ASCII characters suitable for use in an object targeted for JSON use.
Example:
import base64
import json
data = b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'
j = {'data':base64.b64encode(data).decode('ascii')}
s = json.dumps(j)
print(s) # resulting JSON text
# restore back to binary data
j2 = json.loads(s)
data2 = base64.b64decode(j2['data'])
print(data2 == data)
Output:
{"data": "eNoEwLENxCAMheFd/poGrvM2J0IRyUokP7tC7J5vsyLewJ5yb8RSeWIb5T9LGKo5l8Rp3BfWx6 P8wUAAP//IGwSbA=="}
True
Simpler, but a longer result, is to use data.hex()
to get a hexadecimal string representation and bytes.fromhex()
to convert that back to bytes:
>>> s = data.hex()
>>> s
'78da04c0b10dc4200c85e15dfe9a06aef336274211c94a243fbb42ec9e6fb322dec09e726fc45279621be53f4b18aa3997c469dc17d6c7af8ff3050000ffff206c126c'
>>> data2 = bytes.fromhex(s)
>>> data2
b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'
>>> data2 == data
True
CodePudding user response:
use the decode()
method of the bytes object and provide the used encoding as a argument
CodePudding user response:
You don't have to convert the binary data using the base64 encoding algorithm nor into a hexadecimal string as @Mark Tolonen suggests in his answer — both of which require substantially more space to represent the data than the original.
Instead you can take advantage of the fact that JSON strings are "a sequence of zero or more Unicode characters" (per the JSON spec) which means different encoding are supported. This means you can "decode" the binary data into latin1
and the "encode" it back to the original binary data.
Here's what I mean:
import json
data = b'x\xda\x04\xc0\xb1\r\xc4 \x0c\x85\xe1]\xfe\x9a\x06\xae\xf36\'B\x11\xc9J$?\xbbB\xec\x9eo\xb3"\xde\xc0\x9ero\xc4Ryb\x1b\xe5?K\x18\xaa9\x97\xc4i\xdc\x17\xd6\xc7\xaf\x8f\xf3\x05\x00\x00\xff\xff l\x12l'
j = {'data': data.decode('latin1')}
s = json.dumps(j)
print(s) # resulting JSON text
# restore back to binary data
j2 = json.loads(s)
data2 = j2['data'].encode('latin1')
assert data2 == data # Should be identical.
Here's the difference it makes for your sample data:
import base64
print(f"{len(data)}") # -> 67
print(f"{len(data.decode('latin1'))}") # -> 67
print(f"{len(base64.b64encode(data).decode('ascii'))}") # -> 92
print(f"{len(data.hex())}") # -> 134
✶ Note that I learned about the encoding trick from an answer by @Sven Marnach to a question about serializing binary data long ago (and have used multiple times since).