Home > front end >  delete consecutive duplicates in numpy array
delete consecutive duplicates in numpy array

Time:10-18

I want to delete second row of an event trigger of this numpy list, the event triggers are decoded in the 3. column (starting with: 2,1,3,4,5). How can I delete the consecutive row of the same event?

[[  108     0     2]
 [  323     0     2]
 [  543     0     1]
 [  758     0     1]
 [  988     0     3]
 [ 1203     0     3]
 [ 1443     0     4]
 [ 1658     0     4]
 [ 1868     0     5]
 [ 2083     0     5]
 [ 2333     0     5]
 [ 2546     0     5]
 [ 2786     0     4]
 [ 3000     0     4]
 [ 3211     0     1]
 [ 3425     0     1]
 [ 3645     0     2]
 [ 3860     0     2]
 [ 4100     0     3]
 [ 4315     0     3]
 [ 4525     0     3]
 [ 4738     0     3]
 [ 4978     0     2]
 [ 5193     0     2]...

I would be really thankful for some help, thank you in advance!

CodePudding user response:

You can use np.unique with parameter return_index=True:

arr = np.array([[  108,     0,     2],
                [  323,     0,     2],
                [  543,     0,     1],
                [  758,     0,     1],
                [  988,     0,     3],
                [ 1203,     0,     3],
                [ 1443,     0,     4],
                [ 1658,     0,     4],
                [ 1868,     0,     5],
                [ 2083,     0,     5],
                [ 2333,     0,     5],
                [ 2546,     0,     5]])
_, index = np.unique(arr[:, 2], return_index=True)
print(arr[index])

Output:

array([[ 543,    0,    1],
       [ 108,    0,    2],
       [ 988,    0,    3],
       [1443,    0,    4],
       [1868,    0,    5]])

You can see that np.unique automatically sorts (column 2 is sorted). If you want to undo this, you can sort again on column 1 (which was done in your original array):

new_arr = arr[index]
print(new_arr[new_arr[:, 0].argsort()])

Output:

array([[ 108,    0,    2],
       [ 543,    0,    1],
       [ 988,    0,    3],
       [1443,    0,    4],
       [1868,    0,    5]])

CodePudding user response:

input:

a = np.array([
    [  108,0,2],
    [  323,0,2],
    [  543,0,1],
    [  758,0,1],
    [  988,0,3],
    [ 1203,0,3],
    [ 1443,0,4],
    [ 1658,0,4],
    [ 1868,0,5],
    [ 2083,0,5],
    [ 2333,0,5],
    [ 2546,0,5],
    [ 2786,0,4],
    [ 3000,0,4],
    [ 3211,0,1],
    [ 3425,0,1],
    [ 3645,0,2],
    [ 3860,0,2],
    [ 4100,0,3],
    [ 4315,0,3],
    [ 4525,0,3],
    [ 4738,0,3],
    [ 4978,0,2],
    [ 5193,0,2]
])

solution:

a[np.where(np.diff(a[:,-1])!=0)[0]-1]

output:

array([[ 108,    0,    2],
       [ 543,    0,    1],
       [ 988,    0,    3],
       [1443,    0,    4],
       [2333,    0,    5],
       [2786,    0,    4],
       [3211,    0,    1],
       [3645,    0,    2],
       [4525,    0,    3]])

performance:

36.8 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  • Related