I am working with a codebase where you have couples of classes, always one dataclass and another execution class. The dataclass serves as a data collector (as the name suggests).
To "connect" the dataclass to the other class, I set a class variable in the other class to make clear what the relevant dataclass is. This works fine - I can use this class variable to instantiate the data class as I please. However, it is not clear to me how I can use this to specify for a given method that it will return an instance of the linked data class.
Take this example (executable):
from abc import ABC
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class Name(ABC):
name: str
class RelatedName(ABC):
_INDIVIDAL: ClassVar[Name]
def return_name(self, **properties) -> Name:
# There is a typing issue here too but you can ignore that for now
return self._INDIVIDAL(**properties)
@dataclass
class BiggerName(Name):
other_name: str
class RelatedBiggerName(RelatedName):
_INDIVIDAL: ClassVar[Name] = BiggerName
if __name__ == "__main__":
biggie = RelatedBiggerName()
biggiename = biggie.return_name(name="Alfred", other_name="Biggie").other_name
print(biggiename)
The script works fine, but there is a typing problem. In the last but one line, you'll see the issue that the attribute other_name
is undefined for the Name
class. This is to be expected, but I am not sure how I can change the output type of return_name
so that it will use the class that is defined in _INDIVIDUAL
.
I tried def return_name(self, **properties) -> _INDIVIDAL
but that naturally leads to name '_INDIVIDAL' is not defined
.
Perhaps it is not possible what I am after. Is it at all possible to have typing within a class that depends on class variables? I'm interested in Python 3.8 and higher.
CodePudding user response:
Can you use generics?
from abc import ABC
from dataclasses import dataclass
from typing import ClassVar, TypeVar, Generic, Type
T = TypeVar("T", bound="Name")
@dataclass
class Name(ABC):
name: str
class RelatedName(ABC, Generic[T]):
# This would resolve what juanpa.arrivillaga pointed out, but mypy says:
# ClassVar cannot contain type variables, so I guess your use-case is unsupported
# _INDIVIDAL: ClassVar[Type[T]]
# One option:
# _INDIVIDAL: ClassVar
# Second option to demonstrate Type[T]
_INDIVIDAL: Type[T]
def return_name(self, **properties) -> T:
return self._INDIVIDAL(**properties)
@dataclass
class BiggerName(Name):
other_name: str
class RelatedBiggerName(RelatedName[BiggerName]):
# see above
_INDIVIDAL: Type[BiggerName] = BiggerName
if __name__ == "__main__":
biggie = RelatedBiggerName()
biggiename = biggie.return_name(name="Alfred", other_name="Biggie").other_name
print(biggiename)
mypy reports no errors on this and I think conceptually this is what you want. I tested on python 3.10.
CodePudding user response:
I agree with @cherrywoods that a custom generic base class seems like the way to go here.
I would like to add my own variation that should do what you want:
from abc import ABC
from dataclasses import dataclass
from typing import Any, Generic, Optional, Type, TypeVar, get_args, get_origin
T = TypeVar("T", bound="Name")
@dataclass
class Name(ABC):
name: str
class RelatedName(ABC, Generic[T]):
_INDIVIDUAL: Optional[Type[T]] = None
@classmethod
def __init_subclass__(cls, **kwargs: Any) -> None:
"""Identifies and saves the type argument"""
super().__init_subclass__(**kwargs)
for base in cls.__orig_bases__: # type: ignore[attr-defined]
origin = get_origin(base)
if origin is None or not issubclass(origin, RelatedName):
continue
type_arg = get_args(base)[0]
# Do not set the attribute for GENERIC subclasses!
if not isinstance(type_arg, TypeVar):
cls._INDIVIDUAL = type_arg
return
@classmethod
def get_individual(cls) -> Type[T]:
"""Getter ensuring that we are not dealing with a generic subclass"""
if cls._INDIVIDUAL is None:
raise AttributeError(
f"{cls.__name__} is generic; type argument unspecified"
)
return cls._INDIVIDUAL
def __setattr__(self, name: str, value: Any) -> None:
"""Prevent instances from overwriting `_INDIVIDUAL`"""
if name == "_INDIVIDUAL":
raise AttributeError("Instances cannot modify `_INDIVIDUAL`")
super().__setattr__(name, value)
def return_name(self, **properties: Any) -> T:
return self.get_individual()(**properties)
@dataclass
class BiggerName(Name):
other_name: str
class RelatedBiggerName(RelatedName[BiggerName]):
pass
if __name__ == "__main__":
biggie = RelatedBiggerName()
biggiename = biggie.return_name(name="Alfred", other_name="Biggie").other_name
print(biggiename)
Works without problems or complaints from mypy --strict
.
Differences
- The
_INDIVIDUAL
attribute is no longer marked as aClassVar
because that (for no good reason) disallows type variables. - To protect it from being changed by instances, we use a simple customization of the
__setattr__
method. - You no longer need to explicitly set
_INDIVIDUAL
on any specific subclass ofRelatedName
. This is taken care of automatically during subclassing by__init_subclass__
. (If you are interested in details, I explain them in this post.) - Direct access to the
_INDIVIDUAL
attribute is discouraged. Instead there is theget_individual
getter. If the additional parentheses annoy you, I suppose you can play around with discriptors to construct a property-like situation for_INDIVIDUAL
. (Note: You can still just usecls._INDIVIDUAL
orself._INDIVIDUAL
, it's just that there will be the possibleNone
-type issue.) - The base class is obviously a bit more complicated this way, but on the other hand the creation of specific subclasses is much nicer in my opinion.
Hope this helps.