Parametrized return type-hint with pandas DataFrames-CodePudding

I have a function that can return different types, e.g. either a dict or a pd.DataFrame. To provide appropriate type-hints, I naively thought I could just do sth like this to achieve this:

T = TypeVar("T")

def get_obj(key: str, typ: T = Any) -> T:
    # load obj using key
    return obj

But now when I do this

df = get_obj(key="asd", typ=pd.DataFrame)

my linters think df is the class pd.DataFrame, not an instance of pd.DataFrame. E.g. it says (variable) df Type[DataFrame] (and not (variable) df DataFrame), offers me different instantiation methods, and is displeased when I do sth like df.loc.

I know I could do sth like this:

@overload
def get_obj(key: str, is_df: Literal[False]) -> Any:
    ...

@overload
def get_obj(key: str, is_df: Literal[True]) -> pd.DataFrame:
    ...

def get_obj(key: str, is_df: bool = False):
    # load obj using key
    return obj

but then I have to hard-code all possible types.

CodePudding user response：

This is a very cool question! I have working code (which produces correct intellisense based on the given typ in your function) on the below.

The reason the linter doesn't get it, is because you are giving it a generic type, rather than the Type type. The content of the typ parameter is after all a reference to a Type, not an instance or a string.

Luckily, the typing library has the Type type, which facilitates just this use case.

from typing import TypeVar, Type
import pandas as pd


item1: pd.DataFrame = pd.DataFrame()
item2: int = 2
item3: str = "HelloWorld!"

x = {
    "one": item1,
    "two": item2,
    "three": item3
}

T = TypeVar("T")

def get_obj(key: str, typ: Type[T]) -> T:
    # load obj using key
    obj: T = x[key]
    print(type(obj))
    return obj


y = get_obj(key="one", typ=pd.DataFrame)

Note that, because the contents of the typ parameter is variable, we still need to use TypeVar.

The above code will make the IDE see variable y as a pd.dataframe. If you would assign it with y = get_obj(key="one", typ=int) your IDE will think that the object is of type int, while it actually is a pd.dataframe.