Home > OS >  Why does R not remove elements properly over an empty set of indices?
Why does R not remove elements properly over an empty set of indices?

Time:11-28

I have encountered some strange behaviour in R. Suppose I have a matrix and I want to remove a specified set of rows and columns. Here is an example where this works perfectly well.

#Create a matrix
MATRIX <- matrix(1:50, nrow = 4, ncol = 5)
rownames(MATRIX) <- c('a', 'b', 'c', 'd')
colnames(MATRIX) <- c('a', 'b', 'c', 'd', 'e')

#Specify rows and columns to remove
REMOVE.ROW <- 3
REMOVE.COL <- 2

#Print the matrix without these rows or columns
MATRIX[-REMOVE.ROW, -REMOVE.COL]

  a  c  d  e
a 1  9 13 17
b 2 10 14 18
d 4 12 16 20

However, when one or both of the objects REMOVE.ROW or REMOVE.COL are empty, instead of removing nothing (and therefore giving back the original matrix), it gives me back an empty matrix.

#Specify rows and columns to remove
REMOVE.ROW <- integer(0)
REMOVE.COL <- integer(0)

#Print the matrix without these rows or columns
MATRIX[-REMOVE.ROW, -REMOVE.COL]

<0 x 0 matrix>

Intuitively, I would have expected the removal of an empty set of indices to leave me with the original set of indices, and so I would have expected the full matrix back from this command. For some reason, R removes all rows and columns from the matrix in this case. As far as I can make out, this appears to be a bug in R, but perhaps there is some good reason for it that I am unaware of.


Question: Can someone explain why R is doing things this way? Aside from using if-then statements to deal with the special cases, is there any simple adjustment I can make to have R behave as I want it to?

CodePudding user response:

Empty objects have this strange property that they are not NULL, hace length 0 but are not subsettable. A possible workaround is to consider every possible combination and use the property that length(integer0) is equal to zero. I understand that this solution might not be ideal.

is.na(integer(0))
#> logical(0)
is.null(integer(0))
#> [1] FALSE
length(integer(0))
#> [1] 0
integer(0)[[1]]
#> Error in integer(0)[[1]]: subscript out of bounds
integer(0)[[0]]
#> Error in integer(0)[[0]]: attempt to select less than one element in get1index <real>

MATRIX <- matrix(1:50, nrow = 4, ncol = 5)
#> Warning in matrix(1:50, nrow = 4, ncol = 5): data length [50] is not a sub-
#> multiple or multiple of the number of rows [4]

REMOVE.ROW <- integer(0)
REMOVE.COL <- integer(0)

if (all(length(REMOVE.ROW > 0) , length(REMOVE.COL) > 0)) {
  MATRIX[-REMOVE.ROW, -REMOVE.COL]
} else {
  if (length(REMOVE.ROW) > 0 && length(REMOVE.COL) == 0) {
    MATRIX[-REMOVE.ROW, ]
  } else {
    if (length(REMOVE.ROW) == 0 && length(REMOVE.COL) > 0) {
      MATRIX[, -REMOVE.COL]
    } else {
      MATRIX
    }
  }
}
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    5    9   13   17
#> [2,]    2    6   10   14   18
#> [3,]    3    7   11   15   19
#> [4,]    4    8   12   16   20

Created on 2021-11-27 by the reprex package (v2.0.1)

CodePudding user response:

The problem is that R is using arithmetic negation, not set negation

Based on a helpful comment (hat tip to IceCreamToucan) it appears that this is occurring because of the two-step process involved in indexing matrices using negative indices, which are constructed using arithmetic negation instead of set negation. This appears to be one of those cases where the standard mathematical interpretation of an operation is different to the computational interpretation.

In the mathematical interpretation of indexing a matrix over a set of indices we view set negation as producing a new set composed of elements that are in the original 'sample space' but outside the negated set. In the computational interpretation in R the application of the negative sign is instead producing negative arithmetic values, and these are subsequently interpreted as elements to remove when calling the matrix.


What is happening in this case: For the usual case where we have a non-empty set of indices, using the negation sign simply turns the indices into negative values and then when we call the matrix it looks over all the indices other than the negative values.

#Specify rows and columns to remove
REMOVE.ROW <- 3
REMOVE.COL <- 2

#See negatives of the removed indices
identical(MATRIX[-REMOVE.ROW, -REMOVE.COL], MATRIX[-3, -2])
[1] TRUE

However, when we use an empty vector of indices, the negative of that vector is still the empty vector of indices ---i.e., the vector integer(0) is identical to its negative -integer(0). Consequently, when we try to remove the empty vector of indices, we are actually asking to call the matrix over the negative of the empty vector, which is still the empty vector.

#The empty vector is equivalent to its negative
identical(integer(0), -integer(0))
[1] TRUE

#Therefore, calling over these vectors is equivalent
identical(MATRIX[-integer(0), -integer(0)], MATRIX[integer(0), integer(0)])
[1] TRUE

So, the problem here is that you are interpreting -REMOVE.ROW and -REMOVE.COL as if they were using set negation when actually they are just taking the initial vectors of values and turning them negative (i.e., multiplying them by negative one).


Fixing the problem: There does not seem to be a standard function to call the matrix in a way that interprets the indices using set negation, so you will need to use conditional logic to construct the solution for a specific case or for a custom function. Here is a custom function sub.matrix to remove particular rows and columns, where these are interpreted in the sense of set negation.

sub.matrix <- function(x, remove.rows = integer(0), remove.cols = integer(0)) {
  
  #Check that input x is a matrix
  if (!('matrix' %in% class(x))) {
    stop('This function is only for objects of class \'matrix\'') }
  
  #Create output matrix
  R <- length(remove.rows)
  C <- length(remove.cols)
  if  ((R > 0)&(C > 0))  { OUT <- MATRIX[-remove.rows, -remove.cols] }
  if ((R == 0)&(C > 0))  { OUT <- MATRIX[, -remove.cols] }
  if  ((R > 0)&(C == 0)) { OUT <- MATRIX[-remove.rows, ] }
  if ((R == 0)&(C == 0)) { OUT <- MATRIX } 
  
  #Return the output matrix
  OUT }
  • Related