Home > Blockchain >  Change json.dumps behaviour : customize serialization
Change json.dumps behaviour : customize serialization

Time:02-03

Imagine, I've got a dict {"a": "hello", "b": b"list"}

  • 'a' is a string
  • 'b' is a byte string

I would like to serialize the dict into the "json"(*) string --> '{"a": "hello", "b": list}'

(*) : not really json compliant

For that, i've written that method, it works ....

def stringify(obj):
    def my(obj):
        if isinstance(obj,bytes):
            return "<:<:%s:>:>" % obj.decode()
    return json.dumps(obj, default=my).replace('"<:<:',"").replace(':>:>"',"")

(the "<:<:" & ":>:>" are just added before serialization, to be replaced, post json serialization, to obtain the desired result)

It's a little be hacky, using string substitution to obtain the result ... it works ;-)

I ask myself, and you, if it can be done in a better/python way ... Do you have any idea ?

EDIT I would like to rewrite my stringify, in a better way, with assertions :

assert stringify( dict(a="hello",b=b"byte") ) == '{"a": "hello", "b": byte}'
assert stringify( ["hello", b"world"] ) == '["hello", world]'
assert stringify( "hello" ) == '"hello"'
assert stringify( b"world" ) == "world"

CodePudding user response:

In order to achieve your desired output, i.e. '{"a": "hello", "b": list}' you will need to do some ugly, but fair cosmetic changes, such as reconstructing the dictionary by yourself. As the plain old dictionary {"a": "hello", "b": list} makes no sense as a python variable (well, this specific example does, only because we're using the built-in list, but if it was "mymethod" or anything else - it wouldn't)

def stringify(input_dict: dict):
    for k, v in input_dict.items():
        if isinstance(v, bytes):
            input_dict[k] = v.decode()
        else:
            input_dict[k] = f'"{v}"'
    return '{'   ', '.join([f'{k}: {v}' for k, v in input_dict.items()])   '}'

We can see that here we are reconstructing literally a dictionary using ASCII characters, not that bad, not that intuitive but nontheless works as intended.
Your solution does work, but it wouldn't work if one of the values in the dictionary has this special set of characters <:<:.


Making this code:

d = {"a": "hello", "b": b"list"}
serialized_dict = stringify(d)
print(serialized_dict)

Output:

{a: "hello", b: list}

Which is of type str, NOT a valid JSON one.


Edit - a more generic stringify function.

We can do it smarter, making it recursively call the stringify function and if we encounter an atomic object (i.e. int, str etc..) we return it with quotation marks, else (i.e. bytes) we return it without quotation marks.

def generic_stringify(input_generic_object):
    if isinstance(input_generic_object, dict):
        <paste the stringify function Ive posted above>

    elif isinstance(input_generic_object, list):
        return '['   ', '.join([generic_stringify(v) for v in input_generic_object])   ']'

    elif isinstance(input_generic_object, bytes):
        return input_generic_object.decode()
    
    else:
        return f'"{input_generic_object}"'

Here we return the bytes decoded if the type is bytes and return it with quotation marks if it is of type str:

print(generic_stringify(dict(a="hello", b=b"byte")))
print(generic_stringify(["hello", b"world", {"c": b"list"}]))
print(generic_stringify("hello"))
print(generic_stringify(b"world"))

Outputs:

{"a": "hello", "b": byte}
["hello", world, {"c": list}]
"hello"
world
  • Related