Home > Blockchain >  Numpy / Python - ValueError: could not assign tuple of length 6 to structure with 3 fields
Numpy / Python - ValueError: could not assign tuple of length 6 to structure with 3 fields

Time:12-28

Getting the below error on the code snippet that follows the error.

Any ideas on how to solve this?

Pretty much brand new to using Numpy - have spent most of my time using Pandas but trying to move away from using Pandas for numerous performance related issues.

End goal is to run a LEFT JOIN on the two structed arrays.

The error seems to be prompted by the ret[i] = tuple(row1[f1]) tuple(row2[f1]) expression, but honestly not certain why i'd be getting this error.

Tested the row1 and row2 to check the number of fields vs. the f1 which contains the dtype keys, and it all seems to line up from what I can tell.

Any thoughts would be appreciated!

ERROR

ValueError                       Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_43960/3997384146.py in <module>
     66 #            dtype=[('name', 'U10'), ('age', 'i4')])
     67 
---> 68 join_by_left(key='name', r1=struct_arr1, r2=struct_arr2, mask=True)

~\AppData\Local\Temp/ipykernel_43960/3997384146.py in join_by_left(key, r1, r2, mask)
     43                 print(row1[f1])
     44                 print(row2[f1])
---> 45                 ret[i] = tuple(row1[f1])   tuple(row2[f1])
     46 
     47                 i  = 1

~\AppData\Roaming\Python\Python37\site-packages\numpy\ma\core.py in __setitem__(self, indx, value)
   3379         elif not self._hardmask:
   3380             # Set the data, then the mask
-> 3381             _data[indx] = dval
   3382             _mask[indx] = mval
   3383         elif hasattr(indx, 'dtype') and (indx.dtype == MaskType):

ValueError: could not assign tuple of length 6 to structure with 3 fields.

FULL CODE

import numpy as np

def join_by_left(key, r1, r2, mask=True):
    # figure out the dtype of the result array
    descr1 = r1.dtype.descr
    descr2 = [d for d in r2.dtype.descr if d[0] not in r1.dtype.names]
    descrm = descr1   descr2 

    # figure out the fields we'll need from each array
    f1 = [d[0] for d in descr1]
    f2 = [d[0] for d in descr2]

    # cache the number of columns in f1
    ncol1 = len(f1)
    
    print(f1)

    # get a dict of the rows of r2 grouped by key
    rows2 = {}
    for row2 in r2:
        rows2.setdefault(row2[key], []).append(row2)

    # figure out how many rows will be in the result
    nrowm = 0
    for k1 in r1[key]:
        if k1 in rows2:
            nrowm  = len(rows2[k1])
        else:
            nrowm  = 1

    # allocate the return array
    _ret = np.recarray(nrowm, dtype=descrm)
    if mask:
        ret = np.ma.array(_ret, mask=True)
    else:
        ret = _ret

    # merge the data into the return array
    i = 0
    for row1 in r1:
        if row1[key] in rows2:
            for row2 in rows2[row1[key]]:
                print(row1[f1])
                print(row2[f1])
                ret[i] = tuple(row1[f1])   tuple(row2[f1])

                i  = 1
        else:
            for j in range(ncol1):
                ret[i][j] = row1[j]
            i  = 1

    return ret


struct_arr1 = np.array([('jason', 28, '[email protected]'), ('jared', 31, '[email protected]')],
           dtype=[('name', 'U10'), ('age', 'i4'), ('email', 'U10')])

struct_arr2 = np.array([('jason', 22, '[email protected]'), ('jason', 27, '[email protected]'), ('george', 28, '[email protected]'), ('jared', 22, '[email protected]')],
           dtype=[('name', 'U10'), ('age', 'i4'), ('email', 'U10')])


join_by_left(key='name', r1=struct_arr1, r2=struct_arr2, mask=True)

CodePudding user response:

On the line where you're getting the error:

ret[i] = tuple(row1[f1])   tuple(row2[f1])

The operator concatenates two tuples together, so you the result is a tuple with 6 elements, not 3 with the elements added pairwise (if that is what you were expecting).

Simple example:

tuple('abc')   tuple('def')

Results in:

('a', 'b', 'c', 'd', 'e', 'f')
  • Related