I want to use comparison operators between vectors and dataframes. Say, for example, I have a vector vector_test
defined in R as
vector_test = c(1, 2, 3)
and a corresponding dataframe A_test
defined as
A_test = data.frame(
x1 = c(1, 2, 3, 4, 5),
x2 = c(2, 3, 4, 5, 6),
x3 = c(3, 4, 5, 6, 7)
)
I want to use vector_test
for isolating which elements in A_test
are greater than / equal to the elements in vector_test
. I want the output to be something like
A_test >= vector_test
> TRUE TRUE .
TRUE. TRUE. .
TRUE. TRUE. .
TRUE. ... .
TRUE. ... TRUE
But instead I got
It sounds dumb, I know, but I can't figure out (a) what I'm doing wrong and (b) what comparison R is making.
CodePudding user response:
You need t
ranspose the matrix first, compare, then transpose back.
> t(t(A_test) >= vector_test)
x1 x2 x3
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] TRUE TRUE TRUE
This way you can make proper use of recycling.
It's similar to other arithmetic operations such as addition, multiplication, etc. Let's demonstrate addition on a matrix out of zeros with same dimensions like A_test.
> (M <- array(0, dim=dim(A_test)))
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
What you did is similar to:
> M vector_test
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
[3,] 3 2 1
[4,] 1 3 2
[5,] 2 1 3
What you want is:
> t(t(M) vector_test)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 1 2 3
[4,] 1 2 3
[5,] 1 2 3
Data:
> dput(vector_test)
c(1, 2, 3)
> dput(A_test)
structure(list(x1 = c(1, 2, 3, 4, 5), x2 = c(2, 3, 4, 5, 6),
x3 = c(3, 4, 5, 6, 7)), class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
What appears to be happening is that the dataframe is coerced to a matrix which is then coerced to a vector for comparison, and the result is converted back to a matrix.
The vector version of A_test
is
c(1, 2, 3, 4, 5,
2, 3, 4, 5, 6,
3, 4, 5, 6, 7)
When you compare that to a length 3 vector, the vector is first recycled to length 15, giving this:
c(1, 2, 3, 1, 2,
3, 1, 2, 3, 1,
2, 3, 1, 2, 3)
and then elements are compared to the vector from the dataframe. The only FALSE in A_test >= vector_test
comes in the 6th entry. When converted back to a matrix, that's the first entry in the 2nd column, as you saw.
CodePudding user response:
I think you are looking for row-wise comparisons of your data frame to your vector. Perhaps you want
t(apply(A_test, 1, `>=`, vector_test))
#> x1 x2 x3
#> [1,] TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE
#> [4,] TRUE TRUE TRUE
#> [5,] TRUE TRUE TRUE
What R was doing was automatically recycling your length 3 vector 5 times, then comparing this to the three columns of your data frame stacked as one big vector. If we do this explicitly, we'll see we get the same as your initial result:
vector_test_long <- rep(vector_test, 4)
vector_test_long
#> [1] 1 2 3 1 2 3 1 2 3 1 2 3
A_test >= vector_test_long
#> x1 x2 x3
#> [1,] TRUE FALSE TRUE
#> [2,] TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE
#> [4,] TRUE TRUE TRUE
#> [5,] TRUE TRUE TRUE