Home > Enterprise >  Keep vector elements or matrix columns that only appear once
Keep vector elements or matrix columns that only appear once

Time:07-04

I have a matrix:

A<-t(matrix(
c(0, 0, 1,
  0, 0, 0,
  0, 0, 1,
  0, 0, 1,
  0, 0, 0,
  1, 1, 0), 3, 6))

and I need to keep columns that appear only once. So, the expected result is just the 3rd column: (1, 0, 1, 1, 0, 0).

I have found the unique and duplicated functions but I need something stronger to delete all columns that appear more than once (in my example the 1st and 2nd).

CodePudding user response:

Looks like we need a double duplicated.

A[, !(duplicated(t(A)) | duplicated(t(A), fromLast = TRUE)), drop = FALSE]
     [,1]
[1,]    1
[2,]    0
[3,]    1
[4,]    1
[5,]    0
[6,]    0

The idea applies to a vector, too.

x <- c(1, 1, 2, 3, 4, 3, 4, 5)

x[!(duplicated(x) | duplicated(x, fromLast = TRUE))]
[1] 2 5

CodePudding user response:

We can try the base R code below using aggregate to summarize the uniqueness info of A by columns

with(
  aggregate(
    . ~ id,
    data.frame(id = c(col(A)), val = c(A)),
    toString
  ),
  A[, ave(id, val, FUN = length) == 1, drop = FALSE]
)

or equivalently

A[
  ,
  with(
    aggregate(
      . ~ id,
      data.frame(id = c(col(A)), val = c(A)),
      toString
    ),
    ave(id, val, FUN = length) == 1
  ),
  drop = FALSE
]

which gives

     [,1]
[1,]    1
[2,]    0
[3,]    1
[4,]    1
[5,]    0
[6,]    0

CodePudding user response:

Another possible solution:

A[,apply(sapply(which(duplicated(A, MARGIN = 2)),
  \(x) sapply(1:ncol(A), \(y) all(A[,x] == A[,y]))), 1, \(z) all(!z)), drop = F]

#>      [,1]
#> [1,]    1
#> [2,]    0
#> [3,]    1
#> [4,]    1
#> [5,]    0
#> [6,]    0

Or:

A[,colSums(outer(which(duplicated(A, MARGIN = 2)), 1:ncol(A),
    Vectorize(\(x, y) all(A[,x] == A[,y])))) == 0, drop = F]

#>      [,1]
#> [1,]    1
#> [2,]    0
#> [3,]    1
#> [4,]    1
#> [5,]    0
#> [6,]    0
  • Related