check if a bytes-array is a hex-literal-CodePudding

From an API I receive following types of bytes-array:

b'2021:09:30 08:28:24'

b'\x01\x02\x03\x00'

I know how to get the values for them, like for the first one with value.decode() and for the second one with ''.join([str(c) for c in value])

the problem is, I need to do this dynamically. I don't know what the second one is called (is it a hex-literal?), but I can't even check for value.decode().startswith('\x'), it gives me a

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

which makes sense, because of the escape sequence.

so, I need something that checks, if the value is from format \x... or a simple string I can just use with decode().

CodePudding user response：

Use a combination of try/except with isprintable():

def bytes_to_string(data):
    try:
        rv = data.decode()
        return rv if rv.isprintable() else data.hex()
    except UnicodeDecodeError:
        return data.hex()

print(bytes_to_string(b'\x01\x02\x03\x00'))    # Will decode but to unprintable string
print(bytes_to_string(b'2021:09:30 08:28:24')) # Will decode to printable string
print(bytes_to_string(b'Z\xfcrich'))           # Will throw an exception on decode

CodePudding user response：

You can use isprintable to approximate a check for escape sequences inside a string literal:

str_val = value.decode()
if not str_val.isprintable():
    str_val = ''.join([str(c) for c in value])

However, as noted in my comment, this seems like an unnecessary hack, and unreliable at that. The fundamental issue is that there is no hard boundary for telling different byte buffers apart, only heuristics (this is a fundamental theorem of communication theory; there is no way around it!). The API that sends the data should therefore tell you how to interpret the data.

CodePudding user response：

if value.decode()[1] == "x" and value.decode()[2] in HEX_CHARACTERS: