array = np.array(
[[ 1., 1., 82. , 177., 0., 0., -1. ],
[ 2., 2., 83. , 177., 0., 0., 1. ],
[ 3., 2., 84. , 177., 0., 0., 2. ],
[ 4., 2., 85. , 177., 0., 0., 2. ],
[ 5., 2., 82.5, 177., 0., 0., 2. ],
[ 6., 2., 83.5, 177., 0., 0., 3. ]])
then I have list of new elements to append which are:
new_points = np.array(
[[ 7., 2., 82.5, 177., 0., 0., 2. ],
[ 8., 2., 83.5, 177., 0., 0., 4. ],
[ 9., 2., 84.5, 177., 0., 0., 4. ],
[ 10., 2., 84. , 177., 0., 0., 4. ]])
as you can see some rows have the same values in the 3rd and 4th columns that are also present in the array. So, I want to append only the point which combination of values in the 3rd and 4th columns are not present in the original array.
the output that I expect is:
array = [[ 1. 1. 82. 177. 0. 0. -1. ]
[ 2. 2. 83. 177. 0. 0. 1. ]
[ 3. 2. 84. 177. 0. 0. 2. ]
[ 4. 2. 85. 177. 0. 0. 2. ]
[ 5. 2. 82.5 177. 0. 0. 2. ]
[ 6. 2. 83.5 177. 0. 0. 3. ]
[ 9. 2. 84.5 177. 0. 0. 4. ]]
CodePudding user response:
You can use the following approach as well to achieve that goal:
import numpy as np
array = np.array(
[[ 1., 1., 82. , 177., 0., 0., -1. ],
[ 2., 2., 83. , 177., 0., 0., 1. ],
[ 3., 2., 84. , 177., 0., 0., 2. ],
[ 4., 2., 85. , 177., 0., 0., 2. ],
[ 5., 2., 82.5, 177., 0., 0., 2. ],
[ 6., 2., 83.5, 177., 0., 0., 3. ]])
new_points = np.array(
[[ 7., 2., 82.5, 177., 0., 0., 2. ],
[ 8., 2., 83.5, 177., 0., 0., 4. ],
[ 9., 2., 84.5, 177., 0., 0., 4. ],
[ 10., 2., 84. , 177., 0., 0., 4. ]])
filtered_points = []
for point in new_points:
if not np.any((array[:,2] == point[2]) & (array[:,3] == point[3])):
filtered_points.append(point)
result = np.concatenate((array, filtered_points))
print(result)
Output:
[[ 1. 1. 82. 177. 0. 0. -1. ]
[ 2. 2. 83. 177. 0. 0. 1. ]
[ 3. 2. 84. 177. 0. 0. 2. ]
[ 4. 2. 85. 177. 0. 0. 2. ]
[ 5. 2. 82.5 177. 0. 0. 2. ]
[ 6. 2. 83.5 177. 0. 0. 3. ]
[ 9. 2. 84.5 177. 0. 0. 4. ]]
Explanation:
1. Create an empty list called filtered_points
.
2. Iterate over the elements in new_points
. For each element, check if the combination of values in the 3rd
and 4th
columns is present in the array. If it is not present, append the element to filtered_points
.
3. Use the np.concatenate
function to concatenate array and filtered_points
and assign the result to a new variable called result
.
The resulting result
array will contain only the elements from new_points
that have a combination of values in the 3rd
and 4th
columns that are not present in array.
CodePudding user response:
A pure numpy
vectorized solution
This an interesting problem that truly shows how powerful numpy
can be if you understand broadcasting. Avoid using any for loops for this.
You can do this in a completely vectorized way using broadcasting to compare duplicates in the specific columns (3rd, 4th) and then fetching those rows, reducing the dimensions to a boolean array for the new points, and stacking them with original as below. For more details read the NumPy
documentation on how broadcasting works.
cond = ~(array[:,None,2:4] == new_points[None,:,2:4]).all(-1).any(0)
updated_array = np.vstack([array,new_points[cond]])
updated_array
array([[ 1. , 1. , 82. , 177. , 0. , 0. , -1. ],
[ 2. , 2. , 83. , 177. , 0. , 0. , 1. ],
[ 3. , 2. , 84. , 177. , 0. , 0. , 2. ],
[ 4. , 2. , 85. , 177. , 0. , 0. , 2. ],
[ 5. , 2. , 82.5, 177. , 0. , 0. , 2. ],
[ 6. , 2. , 83.5, 177. , 0. , 0. , 3. ],
[ 9. , 2. , 84.5, 177. , 0. , 0. , 4. ]])
EXPLANATION
Here is the flow of shapes for each step -
#Broadcasting rules
(6, 1, 2) # array[:,None,2:4]
(1, 4, 2) # new_points[None,:,2:4]
---------
(6, 4, 2) # == compare with broadcasting
---------
(6, 4) # .all(-1)
(4,) # .any(0)
(4,) # ~ invert boolean
array[:,None,2:4]
andnew_points[None,:,2:4]
fetches the 3rd and 4th column but also adds a dummy dimension in the arrays. This makes their shape as(6, 1, 2)
and(1, 4, 2)
respectively. Why is this important?Because this allows us to use broadcasting to compare the 6 rows to the 4 rows. This is done with just the
==
step. This steparray[:,None,2:4] == new_points[None,:,2:4]
basically results in a(6, 4, 2)
boolean matrix that compares every value from the 3rd and 4th column respectively across the 6 and 4 rows.Since you want to match both values exactly, you can use
.all(-1)
which reduces the last dimension giving you are(6,4)
matrix with True and False values. (True means both values match, False means both values don't match)
array([[False, False, False, False],
[False, False, False, False],
[False, False, False, True],
[False, False, False, False],
[ True, False, False, False],
[False, True, False, False]])
- Finally, since I want to filter only the rows that are duplicate in the original matrix (based on 3rd and 4th column) I can reduce the first axis (0) to get a
(4,)
boolean array, using.any(0)
array([ True, True, False, True])
- Oh, and we want to finally also switch the False to True and vice versa, since True means keep and False means drop in boolean indexing. This is done with the
~
at the start.
array([False, False, True, False]) #this is the cond variable
- Last step would be to filter the
new_points
array and stack it to thearray
, usingnp.vstack
Benchmarks -
Just to show the power of vectorization, here are some benchmarks from other answers -
#Vectorized solution
%%timeit
cond = ~(array[:,None,2:4] == new_points[None,:,2:4]).all(-1).any(0)
updated_array = np.vstack([array,new_points[cond]])
# 9.77 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# For loop solution
%%timeit
filtered_points = []
for point in new_points:
if not np.any((array[:,2] == point[2]) & (array[:,3] == point[3])):
filtered_points.append(point)
result = np.concatenate((array, filtered_points))
#28.8 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
CodePudding user response:
Something like this should work:
array = [
[1., 1., 82. , 177., 0., 0., 1.],
[2., 2., 83. , 177., 0., 0., 1.],
[3., 2., 84. , 177., 0., 0., 2.],
[4., 2., 85. , 177., 0., 0., 2.],
[5., 2., 82.5, 177., 0., 0., 2.],
[6., 2., 83.5, 177., 0., 0., 3.]
]
new_points = [
[ 7., 2., 82.5, 177., 0., 0., 2.],
[ 8., 2., 83.5, 177., 0., 0., 4.],
[ 9., 2., 84.5, 177., 0., 0., 4.],
[10., 2., 84. , 177., 0., 0., 4.]
]
for point in new_points:
# Check if the combination of values in the 3rd and 4th columns is present in the array
if not any(math.isclose(point[2], x[2]) and math.isclose(point[3], x[3]) for x in array):
# If the combination is not present, append the point to the array
array.append(point)
# The resulting array will contain the original points plus the new points that had a unique combination of values in the 3rd and 4th columns
print(array)
You shouldn't compare floats with == operator.