I have two 2d arrays that contain XYZ points, A and B.
Array A has the shape (796704, 3) and is my original pointcloud. Each point is unique except for (0, 0, 0) but those don't matter:
A = [[x_1, y_1, z_1],
[x_2, y_2, z_2],
[x_3, y_3, z_3],
[x_4, y_4, z_4],
[x_5, y_5, z_5],
...]
Array B has the shape (N, 4) and is a cropped version of A (N<796704).
The remaining points did not change and are still equal to their counterpart in A.
The fourth column contains the segmentation value of each point.
The row order of B is completely random and doesn't match A anymore.
B = [[x_4, y_4, z_4, 5],
[x_2, y_2, z_2, 12],
[x_6, y_6, z_6, 5],
[x_7, y_7, z_7, 3],
[x_9, y_9, z_9, 3]]
I need to reorder the rows of B so that they match the rows of A with the same point and fill in the gaps with a zero row:
B = [[0.0, 0.0, 0.0, 0],
[x_2, y_2, z_2, 12],
[0.0, 0.0, 0.0, 0],
[x_4, y_4, z_4, 5],
[0.0, 0.0, 0.0, 0],
[x_6, y_6, z_6, 5],
[x_7, y_7, z_7, 3],
[0.0, 0.0, 0.0, 0],
[x_9, y_9, z_9, 3],
[0.0, 0.0, 0.0, 0],
[0.0, 0.0, 0.0, 0],
[0.0, 0.0, 0.0, 0]
...]
In the end B should have the shape (796704, 4).
I tried using the numpy_indexed package like it was proposed in this very similar question but the issue here is that B doesn't contain all the points of A:
import numpy_indexed as npi
B[npi.indices(B[:, :-1], A)]
I'm not familiar with numpy and my only solution would be a for-loop but that would be far to slow for my application. Is there some sort of fast method of solving this problem?
CodePudding user response:
Pandas => reindex:
import pandas as pd
import numpy as np
A = np.array([[8, 7, 4],
[0, 7, 7],
[4, 7, 0],
[5, 5, 8],
[8, 7, 5]])
B = np.array([[8, 7, 4, 2],
[4, 7, 0, 5],
[8, 7, 5, 6]])
df_B = (pd.DataFrame(B, columns=["x", "y", "z", "seg"])
.set_index(["x", "y", "z"])
.reindex(list(map(tuple, A)))
.reset_index())
df_B.loc[df_B.seg.isna()] = 0
B = df_B.values
print(B)
Result:
array([[8., 7., 4., 2.],
[0., 0., 0., 0.],
[4., 7., 0., 5.],
[0., 0., 0., 0.],
[8., 7., 5., 6.]])
CodePudding user response:
Solving your problem just with numpy:
Case 1
You're working just with numbers:
import numpy as np
A = np.array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9],
[10,10, 10]
])
B = np.array([[4, 4, 4, 5],
[2, 2, 2, 12],
[6, 6, 6, 5],
[7, 7, 7, 3],
[9, 9, 9, 3]])
c = np.insert(A, 3, 0, axis = 1)
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))
print(d)
Out:
[[ 4 4 4 5]
[ 2 2 2 12]
[ 6 6 6 5]
[ 7 7 7 3]
[ 9 9 9 3]
[ 0 0 0 0] # previously 1, 1, 1, 0
[ 0 0 0 0] # previously 3, 3, 3, 0
[ 0 0 0 0] # previously 5, 5, 5, 0
[ 0 0 0 0] # previously 8, 8, 8, 0
[ 0 0 0 0]] # previously 10, 10, 10, 0
Explanation:
1º c
will be a copy of A
with a new field with a 0
:
c = np.insert(A, 3, 0, axis = 1)
If I print c
right now I will get this:
[[ 1 1 1 0]
[ 2 2 2 0]
[ 3 3 3 0]
[ 4 4 4 0]
[ 5 5 5 0]
[ 6 6 6 0]
[ 7 7 7 0]
[ 8 8 8 0]
[ 9 9 9 0]
[10 10 10 0]]
2º You create a new array with B
, and the parts of c
that are not in B
multiplied by 0
.
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))
2.1 np.vstack((B,_))
Here I removed the c
just to be more easy to you to see the args that vstack
receive. You have a tuple with the two arrays that you want to concatenate.
2.2 c[np.in1d(c[:,0],B[:,0], invert=True)]*0
Instead of passing all the c
a pass c
selecting np.in1d(c[:,0],B[:,0], invert=True)
of c
and multiplying it by 0
.
2.3 np.in1d(c[:,0],B[:,0], invert=True)
If I do np.in1d(c[:,0],B[:,0])
I get a boolean array telling me which x_n
of c
also exists in B
, if I set invert=True
i'll get which x_n
of c
does NOT exists in B
. (Another way to to that invertion is by using the tilde operator ~
, so ~np.in1d(c[:,0],B[:,0])
== np.in1d(c[:,0],B[:,0], invert=True)
)
Since each point is unique with the exception of the 0,0,0,0
ones, when I do c[np.in1d(c[:,0],B[:,0], invert=True)]
I get:
array([[ 1, 1, 1, 0],
[ 3, 3, 3, 0],
[ 5, 5, 5, 0],
[ 8, 8, 8, 0],
[10, 10, 10, 0]])
if I multiply by 0 I get:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
So in np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))
I concatenate the B
and the c
.
Being the B
this:
array([[ 4, 4, 4, 5],
[ 2, 2, 2, 12],
[ 6, 6, 6, 5],
[ 7, 7, 7, 3],
[ 9, 9, 9, 3]])
and c
the array of 0
's above. The result at the end is:
array([[ 4, 4, 4, 5],
[ 2, 2, 2, 12],
[ 6, 6, 6, 5],
[ 7, 7, 7, 3],
[ 9, 9, 9, 3],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]])
Case 2
If you are working with strings and numbers you can do that way:
import numpy as np
A = np.array([['x_1', 'y_1', 'z_1'],
['x_2', 'y_2', 'z_2'],
['x_3', 'y_3', 'z_3'],
['x_4', 'y_4', 'z_4'],
['x_5', 'y_5', 'z_5'],
['x_6', 'y_6', 'z_6'],
['x_7', 'y_7', 'z_7'],
['x_8', 'y_8', 'z_8'],
['x_9', 'y_9', 'z_9'],
['x_10', 'y_10', 'z_10']
])
B = np.array([['x_4', 'y_4', 'z_4', 5],
['x_2', 'y_2', 'z_2', 12],
['x_6', 'y_6', 'z_6', 5],
['x_7', 'y_7', 'z_7', 3],
['x_9', 'y_9', 'z_9', 3]])
c = np.insert(A, 3, 0, axis = 1)
c[np.in1d(c[:,0],B[:,0], invert=True)] = 0
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]))
print(d)
Out:
[['x_4' 'y_4' 'z_4' '5']
['x_2' 'y_2' 'z_2' '12']
['x_6' 'y_6' 'z_6' '5']
['x_7' 'y_7' 'z_7' '3']
['x_9' 'y_9' 'z_9' '3']
['0' '0' '0' '0']
['0' '0' '0' '0']
['0' '0' '0' '0']
['0' '0' '0' '0']
['0' '0' '0' '0']]
Explanation:
1º c
will be a copy of A
with a new field with a 0
:
c = np.insert(A, 3, 0, axis = 1)
If I print c
right now I will get this:
[['x_1' 'y_1' 'z_1' '0']
['x_2' 'y_2' 'z_2' '0']
['x_3' 'y_3' 'z_3' '0']
['x_4' 'y_4' 'z_4' '0']
['x_5' 'y_5' 'z_5' '0']
['x_6' 'y_6' 'z_6' '0']
['x_7' 'y_7' 'z_7' '0']
['x_8' 'y_8' 'z_8' '0']
['x_9' 'y_9' 'z_9' '0']
['x_10' 'y_10' 'z_10' '0']]
2º At the fields of c
that don't exist in B
, i'll set as 0
c[np.in1d(c[:,0],B[:,0], invert=True)] = 0
3º d
will be B
the c
part that was set as 0
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]))
Since in this case you're working with strings and numbers in the same array you can't just multiply all by 0
at the d
. So you need to set the fields of c
as 0
and then select the 0
's fields.