Getting the below error on the code snippet that follows the error.
Any ideas on how to solve this?
Pretty much brand new to using Numpy - have spent most of my time using Pandas but trying to move away from using Pandas for numerous performance related issues.
End goal is to run a LEFT JOIN on the two structed arrays.
The error seems to be prompted by the ret[i] = tuple(row1[f1]) tuple(row2[f1])
expression, but honestly not certain why i'd be getting this error.
Tested the row1
and row2
to check the number of fields vs. the f1
which contains the dtype keys, and it all seems to line up from what I can tell.
Any thoughts would be appreciated!
ERROR
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_43960/3997384146.py in <module>
66 # dtype=[('name', 'U10'), ('age', 'i4')])
67
---> 68 join_by_left(key='name', r1=struct_arr1, r2=struct_arr2, mask=True)
~\AppData\Local\Temp/ipykernel_43960/3997384146.py in join_by_left(key, r1, r2, mask)
43 print(row1[f1])
44 print(row2[f1])
---> 45 ret[i] = tuple(row1[f1]) tuple(row2[f1])
46
47 i = 1
~\AppData\Roaming\Python\Python37\site-packages\numpy\ma\core.py in __setitem__(self, indx, value)
3379 elif not self._hardmask:
3380 # Set the data, then the mask
-> 3381 _data[indx] = dval
3382 _mask[indx] = mval
3383 elif hasattr(indx, 'dtype') and (indx.dtype == MaskType):
ValueError: could not assign tuple of length 6 to structure with 3 fields.
FULL CODE
import numpy as np
def join_by_left(key, r1, r2, mask=True):
# figure out the dtype of the result array
descr1 = r1.dtype.descr
descr2 = [d for d in r2.dtype.descr if d[0] not in r1.dtype.names]
descrm = descr1 descr2
# figure out the fields we'll need from each array
f1 = [d[0] for d in descr1]
f2 = [d[0] for d in descr2]
# cache the number of columns in f1
ncol1 = len(f1)
print(f1)
# get a dict of the rows of r2 grouped by key
rows2 = {}
for row2 in r2:
rows2.setdefault(row2[key], []).append(row2)
# figure out how many rows will be in the result
nrowm = 0
for k1 in r1[key]:
if k1 in rows2:
nrowm = len(rows2[k1])
else:
nrowm = 1
# allocate the return array
_ret = np.recarray(nrowm, dtype=descrm)
if mask:
ret = np.ma.array(_ret, mask=True)
else:
ret = _ret
# merge the data into the return array
i = 0
for row1 in r1:
if row1[key] in rows2:
for row2 in rows2[row1[key]]:
print(row1[f1])
print(row2[f1])
ret[i] = tuple(row1[f1]) tuple(row2[f1])
i = 1
else:
for j in range(ncol1):
ret[i][j] = row1[j]
i = 1
return ret
struct_arr1 = np.array([('jason', 28, '[email protected]'), ('jared', 31, '[email protected]')],
dtype=[('name', 'U10'), ('age', 'i4'), ('email', 'U10')])
struct_arr2 = np.array([('jason', 22, '[email protected]'), ('jason', 27, '[email protected]'), ('george', 28, '[email protected]'), ('jared', 22, '[email protected]')],
dtype=[('name', 'U10'), ('age', 'i4'), ('email', 'U10')])
join_by_left(key='name', r1=struct_arr1, r2=struct_arr2, mask=True)
CodePudding user response:
On the line where you're getting the error:
ret[i] = tuple(row1[f1]) tuple(row2[f1])
The
operator concatenates two tuples together, so you the result is a tuple with 6 elements, not 3 with the elements added pairwise (if that is what you were expecting).
Simple example:
tuple('abc') tuple('def')
Results in:
('a', 'b', 'c', 'd', 'e', 'f')