I am using pydantic to manage settings for an app that supports different datasets. Each has a set of overridable defaults, but they are different per datasets. Currently, I have all of the logic correctly implemented via validators:
from pydantic import BaseModel
class DatasetSettings(BaseModel):
dataset_name: str
table_name: str
@validator("table_name", always=True)
def validate_table_name(cls, v, values):
if isinstance(v, str):
return v
if values["dataset_name"] == "DATASET_1":
return "special_dataset_1_default_table"
if values["dataset_name"] == "DATASET_2":
return "special_dataset_2_default_table"
return "default_table"
class AppSettings(BaseModel):
dataset_settings: DatasetSettings
app_url: str
This way, I get different defaults based on dataset_name
, but the user can override them if necessary. This is the desired behavior. The trouble is that once there are more than a handful of such fields and names, it gets to be a mess to read and to maintain. It seems like inheritance/polymorphism would solve this problem but the pydantic factory logic seems too hardcoded to make it feasible, especially with nested models.
class Dataset1Settings(DatasetSettings):
dataset_name: str = "DATASET_1"
table_name: str = "special_dataset_1_default_table"
class Dataset2Settings(DatasetSettings):
dataset_name: str = "DATASET_2"
table_name: str = "special_dataset_2_default_table"
def dataset_settings_factory(dataset_name, table_name=None):
if dataset_name == "DATASET_1":
return Dataset1Settings(dataset_name, table_name)
if dataset_name == "DATASET_2":
return Dataset2Settings(dataset_name, table_name)
return DatasetSettings(dataset_name, table_name)
class AppSettings(BaseModel):
dataset_settings: DatasetSettings
app_url: str
Options I've considered:
- Create a new set of default dataset settings models, override
__init__
ofDatasetSettings
, instantiate the subclass and copy its attributes into the parent class. Kind of clunky. - Override
__init__
ofAppSettings
using thedataset_settings_factory
to set thedataset_settings
attribute ofAppSettings
. Not so good because the default behavior doesn't work in theDatasetSettings
at all, only when instantiated as a nested model inAppSettings
.
I was hoping Field(default_factory=dataset_settings_factory)
would work, but the default_factory
is only for actual defaults so it has zero args. Is there some other way to intercept the args of a particular pydantic field and use a custom factory?
CodePudding user response:
Another option would be to use a Discriminated/Tagged Unions.
But your solution (without looking in detail) looks fine too.
CodePudding user response:
I ended up solving the problem following the first option, as follows. Code is runnable with pydantic 1.8.2 and pydantic 1.9.1.
from typing import Optional
from pydantic import BaseModel, Field
class DatasetSettings(BaseModel):
dataset_name: Optional[str] = Field(default="DATASET_1")
table_name: Optional[str] = None
def __init__(self, **data):
factory_dict = {"DATASET_1": Dataset1Settings, "DATASET_2": Dataset2Settings}
dataset_name = (
data["dataset_name"]
if "dataset_name" in data
else self.__fields__["dataset_name"].default
)
if dataset_name in factory_dict:
data = factory_dict[dataset_name](**data).dict()
super().__init__(**data)
class Dataset1Settings(BaseModel):
dataset_name: str = "DATASET_1"
table_name: str = "special_dataset_1_default_table"
class Dataset2Settings(BaseModel):
dataset_name: str = "DATASET_2"
table_name: str = "special_dataset_2_default_table"
class AppSettings(BaseModel):
dataset_settings: DatasetSettings = Field(default_factory=DatasetSettings)
app_url: Optional[str]
app_settings = AppSettings(dataset_settings={"dataset_name": "DATASET_1"})
assert app_settings.dataset_settings.table_name == "special_dataset_1_default_table"
app_settings = AppSettings(dataset_settings={"dataset_name": "DATASET_2"})
assert app_settings.dataset_settings.table_name == "special_dataset_2_default_table"
# bonus: no args mode
app_settings = AppSettings()
assert app_settings.dataset_settings.table_name == "special_dataset_1_default_table"
A couple of gotchas I discovered along the way:
- If
Dataset1Settings
inherits fromDatasetSettings
, it enters a recursive loop calling init on init ad infinitum. This could be broken with some introspection, but I opted for the duck approach. - The current solution destroys any validators on
DatasetSettings
. I'm sure there's a way to call the validation logic anyway but the current solution effectively sidesteps whatever class-level validation you have by only initing withsuper().__init__
- The same thing works for
BaseSettings
objects, but you have to drag their cumbersome init args:
def __init__(
self,
_env_file: Union[Path, str, None] = None,
_env_file_encoding: Optional[str] = None,
_secrets_dir: Union[Path, str, None] = None,
**values: Any
):
...