Python - numpy - use somes recarray by json in one recarray-CodePudding

My goal is to achieve simplified access to the various json files, with recarrays in a global recarray to achieve simplified end access like: parms.logic.myVarA.

Desired structure
parms (recarray)
[
    logic (recarray)
        - myVarA (int)
    tool (recarray)
        - myVarA (int)
]

Example:
parms.logic.myVarA
parms.tool.myVarA

I'm having a little trouble understanding numpy.recarray, I'm looking for help with a little piece of code. I have a class to test what I want to achieve:

import numpy as np
import simplejson, pathlib

# Loaders
class Loaders(np.recarray):
    def __new__(_, targets):

        content = [[],[]]
        for k, v in targets.items():

            # Load child          
            load = None
            if(pathlib.Path(v).suffix == '.json'):
                with open(v, 'r', encoding='utf-8') as file:
                    child = [[],[]]
                    for k_, v_ in simplejson.load(file).items():
                        child[0].append(v_)
                        child[1].append((k_, type(v_)))
                        
                    load = np.array(tuple(child[0]), dtype=np.dtype(child[1])).view(np.recarray)

                    print(f'CHILD {k} {type(load)}')
                    print(f'Brute {child}')     
                    print(f'Check {k}.myVarA{load.myVarA}\n')
    
            if(load):
                # Add child
                content[0].append(load)    
                content[1].append((k, type(load)))
            
        print('------ Loaded ------')
        print(f'Brute {content}')

        return np.array(tuple(content[0]), dtype=np.dtype(content[1])).view(np.recarray)

if __name__ == '__main__': 
    try:
        # FAILURE
        print('\n------ Loading ------')
        parms = Loaders({
            'logic' : './test/logic/parms.json', 
            'tool'  : './test/tool/parms.json'
        })

        print('\n------ Final check ------')
        print(f'parms dtypes {parms.dtype.names}')
        print(f'parms.logic {parms.logic} {type(parms.logic)}')
        print(f'Check parms.logic.myVarA{parms.logic.myVarA}')

    except Exception as e:
        print(f'Test failure {e}')

Output

CHILD logic <class 'numpy.recarray'>
Brute [[12, 44], [('myVarA', <class 'int'>), ('valB', <class 'int'>)]]
Check logic.myVarA12

CHILD tool <class 'numpy.recarray'>
Brute [[45], [('myVarA', <class 'int'>)]]
Check tool.myVarA45

------ Loaded ------
Brute [[rec.array((12, 44),
          dtype=[('myVarA', '<i8'), ('valB', '<i8')]), rec.array((45,),
          dtype=[('myVarA', '<i8')])], [('logic', <class 'numpy.recarray'>), ('tool', <class 'numpy.recarray'>)]]

------ Final check ------
parms dtypes ('logic', 'tool')
parms.logic (12, 44) <class 'numpy.ndarray'>
Test failure 'numpy.ndarray' object has no attribute 'myVarA'

I can see that the type of 'logic' change once the call is made but I don't understand why...

A check of 'parms' recarray dtype shows the presence of 'logic' and 'tool' but with an ndarray type. Yet higher their type is well recarray:

CHILD logic <class 'numpy.recarray'>
parms dtypes ('logic', 'tool')
print(f'Check parms.logic.valA {parms.logic.valA}')
Test failure 'numpy.ndarray' object has no attribute 'valA'

if any of you have an idea of my problem or a way to do this more simply I'm interested, thank you in advance

CodePudding user response：

With this dtype:

In [169]: dt = np.dtype([('logic',[('myVarA',int)]),('tool',[('myVarA',int)])])

I can make a recarray (with "random" values):

In [170]: arr = np.recarray(3,dt)    
In [171]: arr
Out[171]: 
rec.array([((1969973520,), (598,)), ((1969973584,), (598,)),
           ((1969973552,), (598,))],
          dtype=[('logic', [('myVarA', '<i4')]), ('tool', [('myVarA', '<i4')])])

And access by attribute:

In [172]: arr.logic
Out[172]: 
rec.array([(1969973520,), (1969973584,), (1969973552,)],
          dtype=[('myVarA', '<i4')])    
In [173]: arr.logic.myVarA
Out[173]: array([1969973520, 1969973584, 1969973552])

or field names (as structured array):

In [174]: arr['logic']['myVarA']
Out[174]: array([1969973520, 1969973584, 1969973552])

Another way of nesting recarrays is to use object dtypes:

In [229]: dt1 = np.dtype([('logic',object),('tool',object)])
In [230]: dt2 = np.dtype([('myVarA',int)])

In [231]: arr1 = np.recarray(2, dt1)
In [232]: arr1
Out[232]: 
rec.array([(None, None), (None, None)],
          dtype=[('logic', 'O'), ('tool', 'O')])

The only way I can fill this with recarrays is:

In [233]: for i in range(2):
     ...:     for n in dt1.names:
     ...:         arr1[n][i] = np.recarray(0, dt2)
     ...:         

In [234]: arr1
Out[234]: 
rec.array([(rec.array([],
                     dtype=[('myVarA', '<i4')]), rec.array([],
                     dtype=[('myVarA', '<i4')]))              ,
           (rec.array([],
                     dtype=[('myVarA', '<i4')]), rec.array([],
                     dtype=[('myVarA', '<i4')]))              ],
          dtype=[('logic', 'O'), ('tool', 'O')])

which allows this access:

In [235]: arr1.logic[0].myVarA
Out[235]: array([], dtype=int32)

This may be too detailed for your purposes, but recarray is essentially just a numpy array subclass with a custom method for fetching a field. If the attribute isn't one of the standard array methods or attributes, it checks the dtype.names for a matching names:

def __getattribute__(self, attr):
    # See if ndarray has this attr, and return it if so. (note that this
    # means a field with the same name as an ndarray attr cannot be
    # accessed by attribute).
    try:
        return object.__getattribute__(self, attr)
    except AttributeError:  # attr must be a fieldname
        pass

    # look for a field with this name
    fielddict = ndarray.__getattribute__(self, 'dtype').fields
    try:
        res = fielddict[attr][:2]
    except (TypeError, KeyError) as e:
        raise AttributeError("recarray has no attribute %s" % attr) from e
    obj = self.getfield(*res)

    # At this point obj will always be a recarray, since (see
    # PyArray_GetField) the type of obj is inherited. Next, if obj.dtype is
    # non-structured, convert it to an ndarray. Then if obj is structured
    # with void type convert it to the same dtype.type (eg to preserve
    # numpy.record type if present), since nested structured fields do not
    # inherit type. Don't do this for non-void structures though.
    if obj.dtype.names is not None:
        if issubclass(obj.dtype.type, nt.void):
            return obj.view(dtype=(self.dtype.type, obj.dtype))
        return obj
    else:
        return obj.view(ndarray)