Replace values in 2D/3D-np.array with lookup-table (= np.array with 2 columns: key value)-CodePudding

How can I replace all values in a 2D (or 3D) np.array x with a two-column (key value to replace) lookup table (another np.array) lookup?

x = np.array([[0., 1., 5., 2.], 
              [5., 1., 3., 5.], 
              [4., 1., 1., 2.], 
              [0., 1., 3., 2.], 
              [2., 4., 1., 0.]])

x may also be 3D and the shape is more or less arbitrary.

lookup = np.array([[0, 1.2], 
                   [1, 3.4],
                   [2, 0.1], 
                   [3, 2.1], 
                   [4, 5.4], 
                   [5, 2.2]])

Result:

>>> x
array([[1.2, 3.4, 2.2, 0.1],
       [2.2, 3.4, 2. , 2.2],
       [5.4, 3.4, 3.4, 0.1],
       [1.2, 3.4, 2. , 0.1],
       [0.1, 5.4, 3.4, 1.2]])

Bonus: Normally all values in x are represented in the first column of lookup. How to best handle values in x that are not represented in lookup, e.g. by ignoring them from being replaced or by setting them to nan.

A somewhat inefficient approach so far (only working for 2D but may easily be adopted to 3D): Iterate through all elements in x and compare it with the keys in lookup.

def replaceByLookup(x, lookup):
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            for k in range(lookup.shape[0]):
                if x[i,j] == lookup[k,0]:
                    x[i,j] = lookup[k,1]
                    break

I am looking for a more efficient and maybe simpler solution. I wonder if there isn't a vectorized solution within numpy. It would also be totally ok if the function does not work by reference but return a new array with the replaced values.

CodePudding user response：

You can use np.searchsorted to efficiently locate the row of the associated key in lookup. Then, you can easily get the associated value with a simple direct indexing. Here is an example:

lookup[np.searchsorted(lookup[:,0], x),1]

Note that this require the key to exist and lookup to be sorted by key. Moreover, you should be careful with floating-point number keys as they be be slightly different from the key. One solution to address this problem is to round the values. Furthermore, the lookup array should not contain special values like np.nan in the keys (that being said, they can be supported separately).

Bonus answer:

np.searchsorted search for indices where elements should be inserted to maintain order. If the value does not exists, the function returns the next biggest item in the searched array. You can check if the key match with the searched one to know if the lookup succeed. That being said, you need to make sure the index is actually valid before. This is a bit cumbersome to do. Here is the resulting code:

idx = np.searchsorted(lookup[:,0], x)
corrected_idx = np.minimum(idx, len(lookup)-1)
is_valid = lookup[corrected_idx, 0] == x
x[~is_valid] = np.nan

CodePudding user response：

Couldn't you use a python dictionary in place of the lookup array, because that is the most efficient way to do lookups, much better than searching through the array every iteration.

It would be very worthwhile to convert the lookup array into a python dictionary first and then just assign from that dictionary. It would be very simple code, require no searching or search iterations and would provide the dictionary's really efficient random access.