I would like to structure and unstructure an attrs
object, which includes dict
fields that use simple frozen attrs for the dict keys. This works very well for objects created at runtime, but the frozen attribute fails to make un/structuring with cattrs easy.
This is a simple example of the problem:
import attr, cattr
# Simple attr that contains only a single primitive data type.
@attr.s(frozen=True)
class AbstractID:
_id: Optional[int] = attr.ib()
def __str__(self) -> str:
if self._id is not None:
return f"A{self._id}"
else:
return "—"
@attr.s(auto_attribs=True)
class Database:
storage: dict[AbstractID, str] = {}
# Attempt to unstructure using cattrs
db = Database()
db.storage[AbstractID(1)] = "some data"
cattr.unstructure(db)
>>> TypeError: unhashable type: 'dict'
Is there some way to serialize the data, without using int or str as the dict keys, outside the import/export process? I saw that cattrs offers hooks to customize the serialization process, but I can't figure out how to reduce the AbstractID to an int when unstructuring, or how to structure it back into an AbstractID.
Can this be done?
CodePudding user response:
The default approach fails since it's trying to generate:
{"storage": {{"_id": 1}: "some_data"}
And Python dicts don't support other dicts as keys.
Since we'll be customizing behavior, we'll use a separate instance of a converter. I'll also be using the new attrs APIs since they're cleaner. Here's what you want to do:
from typing import Optional
from attr import define, frozen, Factory
from cattr import GenConverter
# Simple attr that contains only a single primitive data type.
@frozen
class AbstractID:
_id: Optional[int]
def __str__(self) -> str:
if self._id is not None:
return f"A{self._id}"
else:
return "—"
@define
class Database:
storage: dict[AbstractID, str] = Factory(dict)
# Attempt to unstructure using cattrs
db = Database()
db.storage[AbstractID(1)] = "some data"
c = GenConverter()
c.register_unstructure_hook(AbstractID, lambda aid: aid._id)
c.register_structure_hook(AbstractID, lambda v, _: AbstractID(v))
print(c.unstructure(db)) # {'storage': {1: 'some data'}}
print(c.structure(c.unstructure(db), Database)) # Database(storage={AbstractID(_id=1): 'some data'})
cattrs
makes easy work of this stuff.
CodePudding user response:
Well, you can always use marshmallow for stuff like this. It allows you to fully customize the process via schemas. It is usually a good idea to keep your serialization/deserialization separate from your business logic anyway. So, for your example it could look something like this:
from typing import Any
from marshmallow import Schema, fields, post_dump, pre_load, post_load
class AbstractIdSchema(Schema):
_id = fields.Integer()
@pre_load
def pre_load(self, obj: int, **_: Any) -> dict:
return {'_id': obj}
@post_load
def post_load(self, data: dict, **_: Any) -> AbstractID:
return AbstractID(id=data['_id'])
@post_dump
def post_dump(self, data: dict, **_) -> int:
return data['_id']
class DatabaseSchema(Schema):
storage = fields.Dict(
keys=fields.Nested(AbstractIdSchema()),
values=fields.String(),
)
@post_load
def post_load(self, data: dict, **_: Any) -> Database:
return Database(**data)
print(db)
db_schema = DatabaseSchema()
serialized_db = db_schema.dump(db)
print(serialized_db)
deserialized_db = db_schema.load(serialized_db)
print(deserialized_db)
# Prints:
# Database(storage={AbstractID(_id=1): 'some data'})
# {'storage': {1: 'some data'}}
# Database(storage={AbstractID(_id=1): 'some data'})
It would look a bit simpler if _id
was just simply id
(i.e. init arg same as attribute) - then you could do AbstractID(**data)
in post_load
.
And then again, it might be an overkill if your models really are that simple. But if reality is more complex, then it might be the way to go.