How to remove an element from an unevenly shaped array?-CodePudding

labels_1 = np.array([[-100,32,34,25,2,35,2,5,-100,-100],[-100,35,2,5,-100,-100]])
pred_1 = np.array([[8,32,3,25,2,3,2,5,8],[8,3,2,5,8]])

I want to get rid of the -100s in labels_1 and get the correspondingly matching index elements from pred_1.

for eg the output should be

labels_1 = np.array([[32,34,25,2,35,2,5],[35,2,5]])
pred_1 = np.array([[32,3,25,2,3,2,5],[3,2,5]])

I tried to use np.where(labels_1!=-100) but it only works for arrays with lists of same lengths, but as you can see the arrays in labels_1 have different lengths and that is a problem.

CodePudding user response：

I believe what you are looking for is:

pred_1 = [[b for a, b in zip(la, lb) if a != -100] for la, lb in zip(labels_1, pred_1)]
labels_1 = [[a for a in la if a != -100] for la in labels_1]

Outcome:

>>> labels_1
[[32, 34, 25, 2, 35, 2, 5], [35, 2, 5]]

>>> pred_1
[[32, 3, 25, 2, 3, 2, 5], [3, 2, 5]]

As said in the comments, you cannot represent ragged arrays in numpy. If you try as you wrote it, you should see a loud VisibleDeprecationWarning, indicating that your result will be a list of lists, and that if you really intend to do this, you should specify 'dtype=object'... You could look into masked arrays, if that fits your needs better, but otherwise you're better off with simple lists (of lists, in this case).

Edit

Although a bit more convoluted, the following is perhaps more generalizable (any condition involving either or both of a and b, where a and b are elements of one array and the other, respectively):

labels_1, pred_1 = map(list, zip(*[
    list(map(list, zip(*[
        (a, b) for a, b in zip(la, lb)
        if a != -100  # you can adapt this condition at will
    ])))
    for la, lb in zip(labels_1, pred_1)
]))

This is more flexible. For example, it would let you select elements in both arrays when one is larger than the other, or any condition on the pair of elements.

CodePudding user response：

By having unevenly-sized lists in your numpy arrays, you defeat the purpose of numpy arrays, so you can use numpy solutions for this.

Nevertheless, you can use list comphrension for this task:

labels_1 = np.array([[x for x in y if x != -100] for y in labels_1])
pred_1 = np.array([[x for x in y if x != -100] for y in pred_1])

Output:

>>> labels_1
array([list([32, 34, 25, 2, 35, 2, 5]), list([35, 2, 5])], dtype=object)

>>> pred_1
array([list([8, 32, 3, 25, 2, 3, 2, 5, 8]), list([8, 3, 2, 5, 8])], dtype=object)