Using Comparison Operators Between Datasets and Vectors in R: What am I doing wrong?-CodePudding

I want to use comparison operators between vectors and dataframes. Say, for example, I have a vector vector_test defined in R as

vector_test = c(1, 2, 3)

and a corresponding dataframe A_test defined as

A_test = data.frame(
               x1 = c(1, 2, 3, 4, 5),
               x2 = c(2, 3, 4, 5, 6),
               x3 = c(3, 4, 5, 6, 7)
)

I want to use vector_test for isolating which elements in A_test are greater than / equal to the elements in vector_test. I want the output to be something like

A_test >= vector_test

> TRUE  TRUE        .
  TRUE. TRUE.       .
  TRUE. TRUE.       .
  TRUE.  ...        .
  TRUE.  ...        TRUE

But instead I got

It sounds dumb, I know, but I can't figure out (a) what I'm doing wrong and (b) what comparison R is making.

CodePudding user response：

You need transpose the matrix first, compare, then transpose back.

> t(t(A_test) >= vector_test)
       x1   x2   x3
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] TRUE TRUE TRUE

This way you can make proper use of recycling.

It's similar to other arithmetic operations such as addition, multiplication, etc. Let's demonstrate addition on a matrix out of zeros with same dimensions like A_test.

> (M <- array(0, dim=dim(A_test)))
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0
[4,]    0    0    0
[5,]    0    0    0

What you did is similar to:

> M   vector_test
     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    3    2    1
[4,]    1    3    2
[5,]    2    1    3

What you want is:

> t(t(M)   vector_test)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    3
[3,]    1    2    3
[4,]    1    2    3
[5,]    1    2    3

Data:

> dput(vector_test)
c(1, 2, 3)
> dput(A_test)
structure(list(x1 = c(1, 2, 3, 4, 5), x2 = c(2, 3, 4, 5, 6), 
    x3 = c(3, 4, 5, 6, 7)), class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response：

What appears to be happening is that the dataframe is coerced to a matrix which is then coerced to a vector for comparison, and the result is converted back to a matrix.

The vector version of A_test is

c(1, 2, 3, 4, 5, 
  2, 3, 4, 5, 6,
  3, 4, 5, 6, 7)

When you compare that to a length 3 vector, the vector is first recycled to length 15, giving this:

c(1, 2, 3, 1, 2,
  3, 1, 2, 3, 1,
  2, 3, 1, 2, 3)

and then elements are compared to the vector from the dataframe. The only FALSE in A_test >= vector_test comes in the 6th entry. When converted back to a matrix, that's the first entry in the 2nd column, as you saw.

CodePudding user response：

I think you are looking for row-wise comparisons of your data frame to your vector. Perhaps you want

t(apply(A_test, 1, `>=`, vector_test))
#>        x1   x2   x3
#> [1,] TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE
#> [4,] TRUE TRUE TRUE
#> [5,] TRUE TRUE TRUE

What R was doing was automatically recycling your length 3 vector 5 times, then comparing this to the three columns of your data frame stacked as one big vector. If we do this explicitly, we'll see we get the same as your initial result:

vector_test_long <- rep(vector_test, 4)

vector_test_long
#>  [1] 1 2 3 1 2 3 1 2 3 1 2 3

A_test >= vector_test_long
#>        x1    x2   x3
#> [1,] TRUE FALSE TRUE
#> [2,] TRUE  TRUE TRUE
#> [3,] TRUE  TRUE TRUE
#> [4,] TRUE  TRUE TRUE
#> [5,] TRUE  TRUE TRUE