Home > database >  Python : binary created in hexadecimals converted to string : how to put it back to binary?
Python : binary created in hexadecimals converted to string : how to put it back to binary?

Time:06-18

Someone I work with had a bstring and saved it to a file (with other stuff). Later, I open the file with pandas and try to read the bstring, but it has been converted to string:

import pandas as pd

# Saving value
bstring = b"\x0F\xC8\x3F\x7C\x00"
to_write = 'bstrings\n'
to_write  = str(bstring)
with open('test.csv', "w") as f_csv:
    f_csv.write(to_write)

# Reading value
my_df = pd.read_csv('test.csv')
bstring2 = my_df['bstrings'][0]
print(bstring)
print(type(bstring))
print(bstring2)
print(type(bstring2))
print(bstring == bstring2)

The output is :

b'\x0f\xc8?|\x00'
<class 'bytes'>
b'\x0f\xc8?|\x00'
<class 'str'>
False

bstring2 is now a string containing the characters b'\x0f\xc8?|\x00' (inclusing the b and the quotation marks), not a binary.

How do I transform back bstring2 to the binary bstring ?

I tried ctypes.fromhex, which raises a ValueError

I found a few related questions but they did not seem to answer my question.

CodePudding user response:

You can take any string representing a legal Python literal and convert it back to what it represents with ast.literal_eval.

So in your code, just change:

bstring2 = my_df['bstrings'][0]

to:

bstring2 = ast.literal_eval(my_df['bstrings'][0])

adding an import ast to the top of your file, and bstring2 will store the same value as bstring.

CodePudding user response:

Encoding a bytes object using str() is inefficient, mainly meant to be readable by humans but not efficient to read by computers. Better to use base16 or base64 encoding.

import pandas as pd
import base64

# Saving value
bstring = b"\x0F\xC8\x3F\x7C\x00"
to_write = 'bstrings\n'
to_write  = base64.b64encode(bstring).decode('utf-8')
with open('test.csv', "w") as f_csv:
    f_csv.write(to_write)

# Reading value
my_df = pd.read_csv('test.csv')
bstring2 = base64.b64decode(my_df['bstrings'][0].encode('utf-8'))
print(bstring)
print(type(bstring))
print(bstring2)
print(type(bstring2))
print(bstring == bstring2)

Base64 is a very compact way to store bytes in text format. Many others are available though, take a look at the base64 module docs (Standard Library)

  • Related