Home > database >  numpy find unique rows (only appeared once)
numpy find unique rows (only appeared once)

Time:05-11

for example I got many sub-arrays by splitting one array A based on list B:

A = np.array([[1,1,1],
              [2,2,2],
              [2,3,4],
              [5,8,10],
              [5,9,9],
              [7,9,6],
              [1,1,1],
              [2,2,2],
              [9,2,4],
              [9,3,6],
              [10,3,3],
              [11,2,2]])
B = np.array([5,7])
C = np.split(A,B.cumsum()[:-1])
>>>print(C)
>>>array([[1,1,1],
          [1,2,2],
          [2,3,4],
          [5,8,10],
          [5,9,9]]),
   array([[7,9,6],
          [1,1,1],
          [2,2,2],
          [9,2,4],
          [9,3,6],
          [10,3,3],
          [11,2,2]])

How can I find get the rows only appeared once in all the sub-arrays (delete those who appeared twice)? so that I can get the result like: (because [1,1,1] and [2,2,2] appeared twice in C )

>>>array([[2,3,4],
          [5,8,10],
          [5,9,9]]),
   array([[7,9,6],
          [9,2,4],
          [9,3,6],
          [10,3,3],
          [11,2,2]])

CodePudding user response:

You can use np.unique to identify the duplicates:

_, i, c = np.unique(A, axis=0, return_index=True, return_counts=True)

idx = np.isin(np.arange(len(A)), i[c==1])

out = [a[i] for a,i in zip(np.split(A, B.cumsum()[:-1]),
                           np.split(idx, B.cumsum()[:-1]))]

output:

[array([[ 2,  3,  4],
        [ 5,  8, 10],
        [ 5,  9,  9]]),
 array([[ 7,  9,  6],
        [ 9,  2,  4],
        [ 9,  3,  6],
        [10,  3,  3],
        [11,  2,  2]])]
  • Related