Abstract classes and metaclasses with dataclasses in python-CodePudding

I'm trying to isolate a field and a method from classes to work with mongodb.

Example of the working class:

@dataclass
class Article(Mongodata):
    name: str
    quantity: int
    description: str

    _id: Optional[int] = None

    def __getdict__(self):
        result = asdict(self)
        result.pop("_id")
        return result

How can I isolate _id and getdict into an abstract class so that everything works.

@dataclass
class Article(Mongodata):
    name: str
    quantity: int
    description: str

@dataclass
class Mongodata(ABCMeta):
    @property
    @abstractmethod
    def _id(self) -> Optional[int]:
        return None

    def __getdict__(self):
        result = asdict(self)
        result.pop("_id")
        return result

Can you explain how abstract and metaclasses differ, and I came from java, and after reading about it I didn't understand anything?

CodePudding user response：

As you mentioned you're on Python 3.9, you can set it up the same way you had it above, however if you declare the fields in Article as above and add a field definition in the superclass like below:

@dataclass
class Mongodata(ABC):
    _id: Optional[int] = None

Then if you actually try to run the code, you would run into a TypeError as below:

TypeError: non-default argument 'name' follows default argument

The reason for this is the order in which dataclasses resolves the fields for a dataclass when inheritance is involved. In this case, it adds the _id field from the superclass first, and then all the fields in the Article dataclass next. Since the first param that it adds has a default value, but the params that follow it don't have a default value, it'll raise a TypeError as you might expect.

Note that you'd actually run into the same behavior if you had decided to manually generate an __init__ method for the Article class in the same way:

    def __init__(self, _id: Optional[int] = None, name: str, quantity: int, description: str):
                                                             ^
SyntaxError: non-default argument follows default argument

The best approach in Python 3.9, seems to be declare the dataclasses this way, so that all fields in the subclass have default values:

from abc import ABC
from dataclasses import dataclass, asdict
from typing import Optional


@dataclass
class Mongodata(ABC):
    _id: Optional[int] = None

    def __getdict__(self):
        result = asdict(self)
        result.pop("_id")
        return result


@dataclass
class Article(Mongodata):

    name: str = None
    quantity: int = None
    description: str = None

But then positional arguments from creating an Article object will be a problem, because it'll assign the first argument passed in to the constructor to _id:

a = Article('123', 321, 'desc')

So you could instead pass None as the first positional argument, and that'll get assigned to _id. Another approach that works, is to then pass keyword arguments into the constructor instead:

a = Article(name='123', quantity=321, description='desc')

This actually feels more natural with the kw_only param that was introduced to dataclasses in Python 3.10 as a means to resolve this same issue, but more on that below.

A Metaclass Approach

Another option is to declare a function which can be used as a metaclass, as below:

from dataclasses import asdict
from typing import Optional


def add_id_and_get_dict(name: str, bases: tuple[type, ...], cls_dict: dict):
    """Metaclass to add an `_id` field and a `get_dict` method."""

    # Get class annotations
    cls_annotations = cls_dict['__annotations__']
    # This assigns the `_id: Optional[int]` annotation
    cls_annotations['_id'] = Optional[int]
    # This assigns the `_id = None` assignment
    cls_dict['_id'] = None

    def get_dict(self):
        result = asdict(self)
        result.pop('_id')
        return result

    # add get_dict() method to the class
    cls_dict['get_dict'] = get_dict

    # create and return a new class
    cls = type(name, bases, cls_dict)
    return cls

Then you can simplify your dataclass definition a little. Also you technically don't need to define a get_dict method here, but it's useful so that an IDE knows that such a method exists on the class.

from dataclasses import dataclass
from typing import Any


@dataclass
class Article(metaclass=add_id_and_get_dict):

    name: str
    quantity: int
    description: str

    # Add for type hinting, so the IDE knows such a method exists.
    def get_dict(self) -> dict[str, Any]:
        ...

And now it's a bit more intuitive when you want to create new Article objects:

a = Article('abc', 123, 'desc')
print(a)             # Article(name='abc', quantity=123, description='desc', _id=None)
print(a._id)         # None
print(a.get_dict())  # {'name': 'abc', 'quantity': 123, 'description': 'desc'}

a2 = Article('abc', 321, 'desc', _id=12345)
print(a2)               # Article(name='abc', quantity=321, description='desc', _id=12345)
print(a2._id)           # 12345
print(a2.get_dict())    # {'name': 'abc', 'quantity': 321, 'description': 'desc'}

Keyword-only Arguments

In Python 3.10, if you don't want to assign default values to all the fields in a subclass, another option is to decorate the superclass with @dataclass(kw_only=True), so that fields defined in that class are then required to be keyword-only arguments by default.

You can also use the KW_ONLY sentinel value as a type annotation which is provided in dataclasses in Python 3.10 as shown below, which should also make things much simpler and more intuitive to work with.

from abc import ABC
from dataclasses import dataclass, asdict, KW_ONLY
from typing import Optional


@dataclass
class Mongodata(ABC):
    _: KW_ONLY
    _id: Optional[int] = None

    @property
    def dict(self):
        result = asdict(self)
        result.pop("_id")
        return result


# noinspection PyDataclass
@dataclass
class Article(Mongodata):

    name: str
    quantity: int
    description: str

Essentially, any fields defined after the _: KW_ONLY then become keyword-only arguments to the constructor.

Now the usage should be exactly as desired. You can pass both keyword and positional arguments to the constructor, and it appears to work as intended:

a = Article(name='123', quantity=123, description='desc')
print(a)        # Article(_id=None, name='123', quantity=123, description='desc')
print(a._id)    # None
print(a.dict)   # {'name': '123', 'quantity': 123, 'description': 'desc'}

a2 = Article('123', 321, 'desc', _id=112233)
print(a2)        # Article(_id=112233, name='123', quantity=321, description='desc')
print(a2._id)    # 112233
print(a2.dict)   # {'name': '123', 'quantity': 321, 'description': 'desc'}

Also, just a quick explanation that I've been able to come up with, on why this appears to work as it does. Since you've only decorated the superclass as kw_only=True, all this accomplishes is in making _id as a keyword-only argument to the constructor. The fields in the subclass are allowed as either keyword or positional arguments, since we didn't specify kw_only for them.

An easier way to think about this, is to imagine that the signature of the __init__() method that dataclasses generates, actually looks like this:

def __init__(self, name: str, quantity: int, description: str, *, _id: Optional[int] = None):

In Python (not necessarily in 3.10 alone), the appearance of * in a function signifies that all the parameters that follow it are then declared as keyword-only arguments. Note that the _id argument, in this case is added as a keyword-argument after all the positional arguments from the subclass. This means that the method signature is valid, since it's certainly possible for keyword-only arguments to a method to have default values as we do here.