I have an image stored as 3 Numpy arrays:
# Int arrays of coordinates
# Not continuous, some points are omitted
X_image = np.array([1,2,3,4,5,6,7,9])
Y_image = np.array([9,8,7,6,5,4,3,1])
# Float array of RGB values.
# Same index
rgb = np.array([
[0.5543,0.2665,0.5589],
[0.5544,0.1665,0.5589],
[0.2241,0.6645,0.5249],
[0.2242,0.6445,0.2239],
[0.2877,0.6425,0.5829],
[0.5543,0.3165,0.2839],
[0.3224,0.4635,0.5879],
[0.5534,0.6693,0.5889],
])
The RGB information is not convertible to int. So it has to stay floats
I have another array that defines the position of an area of some pixels in the image:
X_area = np.array([3,4,6])
Y_area = np.array([7,6,4])
I need to find the RGB information for these pixels, using the first 4 arrays as a reference.
My idea was to search for the index of these area points in the full image and then use this index to find back the RGB information.
index = search_for_index_of_array_1_in_array_2((X_area,Y_area),(X_image,Y_image))
# index shall be [3,4,6]
rgb_area = rgb[index]
The search_for_index_of_array_1_in_array_2 can be implemented with a for loop. I tried it, this is too slow. I actually have millions of points.
I know that it is probably more of a use case for Julia than Python, as we deal with low-level data manipulation with a performance need, but I'm obliged to use Python. So, the only performance trick I see is to use a vectorized solution with NumPy.
I'm not used to manipulating NumPy. I tried numpy.where.
index = np.where(X_area in X_image and Y_area in Y_image )
index
Gives :
<ipython-input-18-0e434ab7a291>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
index = np.where(X_area in X_image and Y_area in Y_image )
(array([], dtype=int64),)
It shall be empty as we have 3 compliant points.
I also tested, with the same result:
XY_image = np.vstack((X_image,Y_image))
XY_area = np.vstack((X_area,Y_area))
index = np.where(XY_area == XY_image)
and even:
np.extract(XY_image == XY_area, XY_image)
If I get it, the issue is that the arrays do not have the same length. But this is what I have.
Do you have an idea of how to proceed?
Thanks
Edit: here is a loop that works but... is not fast:
indexes = []
for i in range(XY_area.shape[1]):
XY_area_b = np.broadcast_to(XY_area[:,i],(9,2)).transpose()
where_in_image = np.where(XY_area_b == XY_image)
index_in_image = where_in_image[1][1]
indexes.append(index_in_image)
indexes
CodePudding user response:
The classical method to solve this problem is generally to use a hashmap. However, Numpy do not provide such a data structure. That being said, an alternative (generally slower) solution is to sort the values and then perform a binary search. Hopefully, Numpy provide useful functions to do that. This solution run in O(n log(m))
(with n
the number of value to search and m
the number of value searched) should be much faster than a linear search running in O(n m)
time. Here is an example:
# Format the inputs
valType = X_image.dtype
assert Y_image.dtype == valType and X_area.dtype == valType and X_image.dtype == valType
pointType = [('x', valType),('y', valType)]
XY_image = np.ravel(np.column_stack((X_image, Y_image))).view(pointType)
XY_area = np.ravel(np.column_stack((X_area, Y_area))).view(pointType)
# Build an index to sort XY_image and then generate the sorted points
sortingIndex = np.argsort(XY_image)
sorted_XY_image = XY_image[sortingIndex]
# Search each value of XY_area in XY_image and find the location in the unsorted array
tmp = np.searchsorted(XY_image, XY_area)
index = sortingIndex[tmp]
rgb_area = rgb[index]
CodePudding user response:
Thanks to Jérôme's answer, I understand better the value of using a hashmap:
def hashmap(X,Y):
return X 10000*Y
h_area = hashmap(X_area,Y_area)
h_image = hashmap(X_image,Y_image)
np.where(np.isin(h_image,h_area))
This hashmap is a bit brutal, but it actually returns the indexes:
(array([2, 3, 5], dtype=int64),)