Confusing behavior when using TypeVar with dataclass-CodePudding

If I define a type like the following:

StringType = TypeVar('StringType', str, None)

Then I can use it in my class or function definitions:

class StringClass:
    def __init__(self, s: StringType = None):
        self.s = s

def func(s: StringType = None):
    return s

My type checker, Pylance, works great!

But if I define my class using dataclass:

@dataclass
class StringClass:
    s: StringType = None

Then Pylance complains: Type variable "StringType" has no meaning in this context

I do not understand why the second definition using dataclass does not work.

Can someone explain this to me, and hopefully explain how to get dataclass to work with the new type?

CodePudding user response：

First, let's look at what you've got right now.

StringType = TypeVar('StringType', str, None)

This is a type variable, i.e. a generic. It is not saying that StringType is an alias to str | None. It is saying that StringType is a variable that may be introduced into a scope later, and if it is then its possible values are str or None.

Now Python is unfortunately weird about when it introduces type variables into scope. In a language like Java, it's always explicit (i.e. anytime you want to introduce a type variable, you explicitly write it in brackets as <T>). But Python's rules are different.

If a type variable (which is not already in scope) is used in a function, including a member function, then the function itself becomes generic.
If a type variable is used in the parent class declaration of a class, then the whole class becomes generic.

Your dataclass doesn't fit into either of these situations. The dataclass variable isn't a function argument, nor is it a parent class designator, so your type checker gets confused.

So your current constructor code

class StringClass:
    def __init__(self, s: StringType = None):
        self.s = s

is similar to this Java code (pseudocode, since we can't represent the constraint str | None exactly in Java)

public class StringClass {
    public Object s;
    public<T> StringClass(T s) {
        this.s = s;
    }
}

That is, the class itself is not generic. It's the constructor that is, and the instance variable s on the class is inferred to be the least-upper bound of the valid types for the type variable. In Java, that's Object, and in Python (which has union types), that's str | None.

As pointed out in the comments, what you probably want is a union type.

class StringClass:
    def __init__(self, s: str | None = None):
        self.s = s

And you can alias these just like anything else.

StringType = str | None

(Note: If you're using a Python version older than 3.10, you'll need to use Union instead, since the | syntax is not permitted at runtime until Python 3.10. the type checker will have no issue with it either way)

However, if you want your class to be generic, then you want the whole class to be generic, not just the constructor.

class StringClass(Generic[StringType]):
    def __init__(self, s: StringType = None):
        self.s = s

typing.Generic is a superclass designed specifically to introduce type variables. It does nothing else and does not add any methods to the class (except some reflection stuff to allow the [] syntax to work). Now your constructor isn't generic but your whole class is. You refer, in types, to StringClass with a type argument as StringClass[str] or StringClass[None].

This approach extends to your dataclass just fine. Dataclasses can have arbitrary superclasses, including Generic.

@dataclass
class StringClass(Generic[StringType]):
    s: StringType = None

Now the type variable is in scope, as it was introduced by the class itself, so it can be used in the instance fields just fine.

So, depending on your use case, I recommend either (1) introducing the type variable at class scope rather than constructor scope, or (2) declaring a type alias rather than a type variable, and using that. The question of which one to use comes down to whether or not, in your situation, it makes sense to keep track of the actual type of the argument at compile-time. With the first approach, you'll be able to write StringClass[None] or StringClass[str] or similar in order to further restrict the type of StringClass that you expect in a particular situation. This can be useful but can also get tedious if you don't often need to know that information.

EDIT: I see the comments on the question now. Based on what you're describing, I think the type alias fits your use case better. So write

StringType = str | None | QuerySet | ModelBase

or, for Python 3.9 and older,

StringType = Union[str, None, QuerySet, ModelBase]