Home > other >  Is it possible to group variables in a python dataclass?
Is it possible to group variables in a python dataclass?

Time:10-16

I have searched but found no good answer, so i'll make a post of it :)

I'm currently creating a python module which uses a http get request to fetch a object with a bunch of data which is structured like this.

  • Main group
    • Group 1
      • data1
      • data2
    • Group 2
      • data1
      • data2
    • Group 3
      • data1
      • data2

I have created a dataclass which just lists all these variables like

@dataclass
class MyData:
  grp1_data1: str
  grp1_data2: str
  grp2_data1: str
  grp2_data2: str
  grp3_data1: str
  grp3_data2: str

@classmethod
def from_dict(cls, data: dict) -> "MyData":
    return cls(
      grp1_data1=data["Main group"]["Group 1"]["data1"],
      grp1_data2=data["Main group"]["Group 1"]["data2"],
      # And so on ...
    )

What I'm in search for is a way to group the variables inside the dataclass similar to a struct so that i don't need to mix group name and data name in the variable name.

I'm quite new to python and I don't know what kind of such group functionalities that work with dataclasses, if there is any?

I would like to be able to write something like grp1.data1=data["Main group"]["Group 1"]["data1"] or similar.

CodePudding user response:

Your question is a bit unclear, but as suggested in the comments, it would be best to have a single model as a dataclass which represents your group data (i.e. a model containing data1 and data2 fields) and define a helper function which constructs a mapping of group name to a model instances, as shown below.

Note: This assumes you are using Python 3.8 . For earlier versions, I would do two things:

  • Remove the __future__ import if needed, and instead import Type and Dict from the typing module, as the builtin types don't support subscripted values in Python 3.8 or earlier.
  • Remove the usage of the walrus := operator that was introduced in Python 3.8, and instead use the commented line that follows it.
# Future import to allow the `int | str` syntax below
# Can be removed for Python 3.10
from __future__ import annotations

from dataclasses import dataclass
from typing import TypeVar


# Create a type that can be `MyData`, or any subclass
D = TypeVar('D', bound='MyData')


@dataclass
class MyData:
    data1: str
    data2: str

    @classmethod
    def from_dict(cls: type[D], data: dict, group_num: int | str) -> D:
        return cls(
            data1=data['MG'][f'G {group_num}']['data1'],
            data2=data['MG'][f'G {group_num}']['data2'],
        )

    @classmethod
    def group_to_data(cls: type[D], data: dict) -> dict[int, D]:
        return {(group_num := int(group_key.split()[-1])): cls.from_dict(
                    data, group_num)
                for group_key in data['MG']}

        # For Python 3.7 or lower, uncomment and use the below instead
        # ret_dict = {}
        # for group_key in data['MG']:
        #     group_num = int(group_key.split()[-1])
        #     ret_dict[group_num] = cls.from_dict(data, group_num)
        #
        # return ret_dict

Code for testing:

def main():
    from pprint import pprint

    my_data = {
        'MG': {
            'G 1': {
                'data1': 'hello',
                'data2': 'World!',
            },
            'G 2': {
                'data1': '',
                'data2': 'Testing',
            },
            'G 3': {
                'data1': 'hello 123',
                'data2': 'world 321!'
            }
        }
    }

    group_to_data = MyData.group_to_data(my_data)
    pprint(group_to_data)

    # True
    assert group_to_data[1] == MyData('hello', 'World!')

Output:

{1: MyData(data1='hello', data2='World!'),
 2: MyData(data1='', data2='Testing'),
 3: MyData(data1='hello 123', data2='world 321!')}

CodePudding user response:

It is possible to create multilevel dataclasses to do what you want (perhaps not as elegant as C-type struct's, but it works) using class composition:

@dataclass
class Top:
    
    @dataclass
    class Child:
        data1: str
        data2: str
            
    Group1: Child
    Group2: Child
    Group3: Child
        
        
inst = Top(
    Group1=Top.Child('a','b'),
    Group2=Top.Child('x', 'y'),
    Group3=Top.Child('101', '102')
)

# check it:
@dataclass
class Top:
    
    @dataclass
    class Child:
        data1: str
        data2: str
            
    Group1: Child
    Group2: Child
    Group3: Child
        

# create an instance
inst = Top(
    Group1=Top.Child('a','b'),
    Group2=Top.Child('x', 'y'),
    Group3=Top.Child('101', '102')
)

# check it:
assert inst.Group2.data2 == 'y'

The key is you have to define all the child members as dataclasses as well (or more correctly just as classes). You can define the child class(es) in place (like above) or separately.

  • Related