Someone I work with had a bstring and saved it to a file (with other stuff). Later, I open the file with pandas and try to read the bstring, but it has been converted to string:
import pandas as pd
# Saving value
bstring = b"\x0F\xC8\x3F\x7C\x00"
to_write = 'bstrings\n'
to_write = str(bstring)
with open('test.csv', "w") as f_csv:
f_csv.write(to_write)
# Reading value
my_df = pd.read_csv('test.csv')
bstring2 = my_df['bstrings'][0]
print(bstring)
print(type(bstring))
print(bstring2)
print(type(bstring2))
print(bstring == bstring2)
The output is :
b'\x0f\xc8?|\x00'
<class 'bytes'>
b'\x0f\xc8?|\x00'
<class 'str'>
False
bstring2 is now a string containing the characters b'\x0f\xc8?|\x00' (inclusing the b and the quotation marks), not a binary.
How do I transform back bstring2
to the binary bstring
?
I tried ctypes.fromhex
, which raises a ValueError
I found a few related questions but they did not seem to answer my question.
CodePudding user response:
You can take any string representing a legal Python literal and convert it back to what it represents with ast.literal_eval
.
So in your code, just change:
bstring2 = my_df['bstrings'][0]
to:
bstring2 = ast.literal_eval(my_df['bstrings'][0])
adding an import ast
to the top of your file, and bstring2
will store the same value as bstring
.
CodePudding user response:
Encoding a bytes object using str()
is inefficient, mainly meant to be readable by humans but not efficient to read by computers. Better to use base16 or base64 encoding.
import pandas as pd
import base64
# Saving value
bstring = b"\x0F\xC8\x3F\x7C\x00"
to_write = 'bstrings\n'
to_write = base64.b64encode(bstring).decode('utf-8')
with open('test.csv', "w") as f_csv:
f_csv.write(to_write)
# Reading value
my_df = pd.read_csv('test.csv')
bstring2 = base64.b64decode(my_df['bstrings'][0].encode('utf-8'))
print(bstring)
print(type(bstring))
print(bstring2)
print(type(bstring2))
print(bstring == bstring2)
Base64 is a very compact way to store bytes in text format. Many others are available though, take a look at the base64 module docs (Standard Library)