Numpy broadcast using dictionary of index conversion-CodePudding

I have a dictionary of index conversions:

d = {0:[0,1,3], 1:[4,5,6], 2:[2,7,8], 3:[9]...}

where keys (0,1,2,3...) represent the indices in array 1 and values represent the list of equivalent indices in array 2.

Given array 1 of shape (len(d.keys()), n) where n is a constant. Is it possible to broadcast array 1 to create array 2 of shape (sum([len(value) for value in d.values()]), n).

Here is what I have done so far using a for loop:

d = {0:[0,1,3], 1:[4,5,6], 2:[2,7,8], 3:[9]}

arr1 = np.array([[0,1],
                 [np.NaN,np.NaN],
                 [np.NaN,6.5],
                 [16,0.2]])

arr2 = np.full((10,2),np.NaN)

for idx in np.unique(np.where(~np.isnan(arr1))[0]):

    new_idx = d[idx]
    arr2[new_idx,:] = arr1[idx,:]

The actual arr1 I am working with has shape (600,n) and it is sparse (lots of np.NaN values) which is why I set the default values of arr2 to np.NaN and iterate through non NaN rows. The actual arr2 has shape (198812,n). Any suggestions for how to speed up this conversion with a vectorized operation that doesn't involve a for loop?

Thanks!

CodePudding user response：

Here is an approach, but you'll necessarily have to loop at some point (here to reconstruct the dictionary):

d2 = {v: k for k,l in d.items() for v in l}
# {0: 0, 1: 0, 3: 0, 4: 1, 5: 1, 6: 1, 2: 2, 7: 2, 8: 2, 9: 3}

arr2 = np.full((10,2), np.NaN)

arr2[list(d2)] = arr1[list(d2.values())]

output:

array([[ 0. ,  1. ],
       [ 0. ,  1. ],
       [ nan,  6.5],
       [ 0. ,  1. ],
       [ nan,  nan],
       [ nan,  nan],
       [ nan,  nan],
       [ nan,  6.5],
       [ nan,  6.5],
       [16. ,  0.2]])