I have a 2d numpy array that contains tuple with two elements: an int and an str.
An example on how the 2d array may look:
matrix = np.array(
[[(1, 'foo'), (), (4, 'bar')],
[(),(),()],
[(1, 'foo'), (), (3, 'foobar')],
[(),(),()]],
dtype=object)
I'm looking to remove the lines that contains only empty tuples.
I tried the following code:
matrix = matrix[~np.all(matrix == (), axis=1)]
but it gave me the following error:
numpy.AxisError: axis 1 is out of bounds for array of dimension 0
The above code works for a 2d array that contains only integers with a condition like that in the all
function: matrix == 0
.
It correctly removes all lines that contains only zeros. So is there a way to do that but instead of removing lines with only zeros, to remove lines with only empty tuples?
CodePudding user response:
The problem here is that tuples are Sequence Types. When you try to apply matrix == ()
, Numpy makes a comparison of matrices, and so matrix == ()
return a simple false
.
This explains the error axis 1 is out of bounds for array of dimension 0
, since false
is of dimension 0.
A workaround is to test differently if a tuple is empty, for example by vectorizing the len function:
>>> vect_len = np.vectorize(len)
Then, we can do:
>>> matrix = matrix[~np.all(vect_len(matrix) == 0, axis=1)]
[[(1, 'foo') () (4, 'bar')]
[(1, 'foo') () (3, 'foobar')]]
Or even more simple:
>>> matrix = matrix[np.any(vect_len(matrix), axis=1)]
[[(1, 'foo') () (4, 'bar')]
[(1, 'foo') () (3, 'foobar')]]
CodePudding user response:
As suggested in the comments, do not use numpy here. Numpy is for numbers. You don't have numbers. Numpy arrays may be able to hold object
but there's no benefit here, and you run into problems as you've seen.
You can just use a "list comprehension" and the all()
function to filter your data.
lines = [
[(1, 'foo'), (), (4, 'bar')],
[(),(),()],
[(1, 'foo'), (), (3, 'foobar')],
[(),(),()]]
lines = [ line for line in lines if not all(elem == () for elem in line) ]
CodePudding user response:
You can try to traverse the array with a for loop and check if a sublist is made only with empty tuples with all()
function:
import numpy as np
matrix = np.array([[(1, 'foo'), (), (4, 'bar')], [(), (), ()], [(1, 'foo'), (), (3, 'foobar')], [(), (), ()]])
for i in range(len(matrix)):
try:
if all(x == () for x in matrix[i]):
matrix = np.delete(matrix, i, axis=0)
except:
pass
print(matrix)
Output:
[[(1, 'foo') () (4, 'bar')]
[(1, 'foo') () (3, 'foobar')]]