Home > Software design >  Initialize dataclass instance with functions
Initialize dataclass instance with functions

Time:10-05

I'm trying to create a dataclass to store all relevant data in a single object. How can I initialize a dataclass instance where the values are evaluated from functions within the dataclass, which take parameters?

This is where I am so far:

@dataclass
class Person: 
    def Name(self):
        return f'My name is {self.name[0]} {self.name[1]}.'

    def Age(self):
        return f'I am {self.age} years old.'

    name: field(default_factory=Name(self), init=True)
    age: field(default_factory=Age(self), init=True)

person = Person(('John', 'Smith'), '100')
print(person)

Current output:

Person(name=('John', 'Smith'), age='100')

This is the output I'm trying to achieve:

Person(name='My name is John Smith', age='I am 100 years old')

I was trying to use How to reference `self` in dataclass' fields? for reference on this topic.

CodePudding user response:

It is not possible to achieve what you are trying to do with dataclasses.field(...) alone, as I believe the docs indicate default_factory needs to be a zero argument callable.

For instance, default_factory=list works as list() provides a no-arg constructor.

However, note that the following is not possible:

field(default_factory = lambda world: f'hello {world}!')

dataclasses will not pass a value for world to the default_factory function, so you will run into an error with such an approach.

The good news is there are a few different alternatives or options to consider in your case, which I proceed to outline below.

Init-only Variables

To work around this, one option could be to use a combination of InitVar with field(init=False):

from dataclasses import field, dataclass, InitVar


@dataclass
class Person:

    in_name: InitVar[tuple[str, str]]
    in_age: InitVar[str]

    name: str = field(init=False)
    age: str = field(init=False)

    def __post_init__(self, in_name: tuple[str, str], in_age: str):
        self.name = f'My name is {in_name[0]} {in_name[1]}.'
        self.age = f'I am {in_age} years old.'


person = Person(('John', 'Smith'), '100')
print(person)

Prints:

Person(name='My name is John Smith.', age='I am 100 years old.')

Properties

Another usage could be with field-properties in dataclasses. In this case, the values are passed in to the constructor method as indicated (i.e. a tuple and str), and the @setter method for each field-property generates a formatted string, which it stores in a private attribute, for example as self._name.

Note that there is undefined behavior when no default values for field properties are passed in the constructor, due to how dataclasses handles (or rather silently ignores) properties currently.

To work around that, you can use a metaclass such as one I have outlined in this gist.

from dataclasses import field, dataclass


@dataclass
class Person:

    name: tuple[str, str]
    age: str

    # added to silence any IDE warnings
    _age: str = field(init=False, repr=False)
    _name: str = field(init=False, repr=False)

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, name: tuple[str, str]):
        self._name = f'My name is {name[0]} {name[1]}.'

    @property
    def age(self):
        return self._age

    @age.setter
    def age(self, age: str):
        self._age = f'I am {age} years old.'


person = Person(('John', 'Smith'), '100')
print(person)

person.name = ('Betty', 'Johnson')
person.age = 150
print(person)

# note that a strange error is returned when no default value is passed for
# properties; you can use my gist to work around that.
# person = Person()

Prints:

Person(name='My name is John Smith.', age='I am 100 years old.')
Person(name='My name is Betty Johnson.', age='I am 150 years old.')

Descriptors

One last option I would be remiss to not mention, and one I would likely recommend as being a little bit easier to set up than properties, would be the use of descriptors in Python.

From what I understand, descriptors are essentially an easier approach as compared to declaring a ton of properties, especially if the purpose or usage of said properties is going to be quite similar.

Here is an example of a custom descriptor class, named FormatValue:

from typing import Callable, Any


class FormatValue:
    __slots__ = ('fmt', 'private_name', )

    def __init__(self, fmt: Callable[[Any], str]):
        self.fmt = fmt

    def __set_name__(self, owner, name):
        self.private_name = '_'   name

    def __get__(self, obj, objtype=None):
        value = getattr(obj, self.private_name)
        return value

    def __set__(self, obj, value):
        setattr(obj, self.private_name, self.fmt(value))

It can be used as follows, and works the same as the above example with properties:

from dataclasses import dataclass


@dataclass
class Person:
    name: 'tuple[str, str] | str' = FormatValue(lambda name: f'My name is {name[0]} {name[1]}.')
    age: 'str | int' = FormatValue(lambda age: f'I am {age} years old.')


person = Person(('John', 'Smith'), '100')
print(person)

person.name = ('Betty', 'Johnson')
person.age = 150
print(person)

Prints:

Person(name='My name is John Smith.', age='I am 100 years old.')
Person(name='My name is Betty Johnson.', age='I am 150 years old.')
  • Related