dataclasses.dataclass with __init_subclass_

My confusion is with the interplay between dataclasses & __init_subclass__.

I am trying to implement a base class that will exclusively be inherited from. In this example, A is the base class. It is my understanding from reading the python docs on dataclasses that simply adding a decorator should automatically create some special dunder methods for me. Quoting their docs:

For example, this code:

from dataclasses import dataclass

@dataclass
class InventoryItem:
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

will add, among other things, a __init__() that looks like:

def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
    self.name = name
    self.unit_price = unit_price
    self.quantity_on_hand = quantity_on_hand

This is an instance variable, no? From the classes docs, it shows a toy example, which reads super clear.

class Dog:

    kind = 'canine'         # class variable shared by all instances

    def __init__(self, name):
        self.name = name    # instance variable unique to each instance

A main gap in my understanding is - is it an instance variable or a class variable? From my testing below, it is a class variable, but from the docs, it shows an instance variable as it's proximal implementation. It may be that most of my problem is there. I've also read the python docs on classes, which do not go into dataclasses.

The problem continues with the seemingly limited docs on __init_subclass__, which yields another gap in my understanding. I am also making use of __init_subclass__, in order to enforce that my subclasses have indeed instantiated the variable x.

Below, we have A, which has an instance variable x set to None. B, C, and D all subclass A, in different ways (hoping) to determine implementation specifics.

B inherits from A, setting a class variable of x. D is a dataclass, which inherits from A, setting what would appear to be a class variable of x. However, given their docs from above, it seems that the class variable x of D should be created as an instance variable. Thus, when D is created, it should first call __init_subclass__, in that function, it will check to see if x exists in D - by my understanding, it should not; however, the code passes scot-free. I believe D() will create x as an instance variable because the dataclass docs show that this will create an __init__ for the user.

"will add, among other things..." <insert __init__ code>

I must be wrong here but I'm struggling to put it together.

import dataclasses

class A:
    def __init__(self):
        self.x = None

    def __init_subclass__(cls):
        if not getattr(cls, 'x') or not cls.x:
            raise TypeError(
                f'Cannot instantiate {cls.__name__}, as all subclasses of {cls.__base__.__name__} must set x.'
            )


class B(A):
    x = 'instantiated-in-b'


@dataclasses.dataclass
class D(A):
    x : str = 'instantiated-in-d'



class C(A):
    def __init__(self):
        self.x = 'instantiated-in-c'


print('B', B())
print('D', D())
print('C', C())

The code, per my expectation, properly fails with C(). Executing the above code will succeed with D, which does not compute for me. In my understanding (which is wrong), I am defining a field, which means that dataclass should expand my class variables as instance variables. (The previous statement is most probably where I am wrong, but I cannot find anything that documents this behavior. Are data classes not actually expanding class variables as instance variables? It certainly appears that way from the visual explanation in their docs.) From the dataclass docs:

The dataclass() decorator examines the class to find fields. A field is defined as a class variable that has a type annotation.

Thus - why - when creating an instance D() - does it slide past the __init_subclass__ of its parent A?

Apologies for the lengthy post, I must be missing something simple, so if once can point me in the right direction, that would be excellent. TIA!

I have just found the implementation for dataclasses from the CPython github.

CodePudding user response：

__init_subclass__ is called when initializing a subclass. Not when initializing an instance of a subclass - it's called when initializing the subclass itself. Your exception occurs while trying to create the C class, not while trying to evaluate C().

Decorators, such as @dataclass, are a post-processing mechanism, not a pre-processing mechanism. A class decorator takes an existing class that has already gone through all the standard initialization, including __init_subclass__, and modifies the class. Since this happens after __init_subclass__, __init_subclass__ doesn't see any of the modifications that @dataclass performs.

Even if the decorator were to be applied first, D still would have passed the check in A.__init_subclass__, because the dataclass decorator will set D.x to the default value of the x field anyway, so __init_subclass__ will find a value of x. In this case, that happens to be the same thing you set D.x to in the original class definition, but it can be a different object in cases where you construct field objects explicitly.

(Also, you probably wanted to write hasattr instead of getattr in not getattr(cls, 'x').)