I have a dictionary of index conversions:
d = {0:[0,1,3], 1:[4,5,6], 2:[2,7,8], 3:[9]...}
where keys (0,1,2,3...) represent the indices in array 1 and values represent the list of equivalent indices in array 2.
Given array 1 of shape (len(d.keys()), n)
where n
is a constant. Is it possible to broadcast array 1 to create array 2 of shape (sum([len(value) for value in d.values()]), n)
.
Here is what I have done so far using a for
loop:
d = {0:[0,1,3], 1:[4,5,6], 2:[2,7,8], 3:[9]}
arr1 = np.array([[0,1],
[np.NaN,np.NaN],
[np.NaN,6.5],
[16,0.2]])
arr2 = np.full((10,2),np.NaN)
for idx in np.unique(np.where(~np.isnan(arr1))[0]):
new_idx = d[idx]
arr2[new_idx,:] = arr1[idx,:]
The actual arr1 I am working with has shape (600,n)
and it is sparse (lots of np.NaN values) which is why I set the default values of arr2 to np.NaN and iterate through non NaN rows. The actual arr2 has shape (198812,n)
. Any suggestions for how to speed up this conversion with a vectorized operation that doesn't involve a for
loop?
Thanks!
CodePudding user response:
Here is an approach, but you'll necessarily have to loop at some point (here to reconstruct the dictionary):
d2 = {v: k for k,l in d.items() for v in l}
# {0: 0, 1: 0, 3: 0, 4: 1, 5: 1, 6: 1, 2: 2, 7: 2, 8: 2, 9: 3}
arr2 = np.full((10,2), np.NaN)
arr2[list(d2)] = arr1[list(d2.values())]
output:
array([[ 0. , 1. ],
[ 0. , 1. ],
[ nan, 6.5],
[ 0. , 1. ],
[ nan, nan],
[ nan, nan],
[ nan, nan],
[ nan, 6.5],
[ nan, 6.5],
[16. , 0.2]])