I recently came across df[logicaldf,] command and was really confused as I have only seen the df[x,] format where x is a column in df. Here is an example chunk when I ran the code in R:
> c1 <- c(11, 2, 3, 4, 53)
> c2 <- c(9, 3, 5, 5, 2)
> c3 <- c(1, 10, 3, 2, 2)
> foo <- data.frame(c1, c2, c3)
> foo5 <- foo > 5
> head(foo5)
c1 c2 c3
[1,] TRUE TRUE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE FALSE
[5,] TRUE FALSE FALSE
> table(rowSums(foo5))
0 1 2
2 2 1
> foo[foo5, ]
c1 c2 c3
1 11 9 1
5 53 2 2
NA NA NA NA
NA.1 NA NA NA
Could someone explain what is happening here?
CodePudding user response:
If you omit the comma, then you would just extract the elements where foo5==TRUE
:
foo[foo5]
## 11 53 9 10
However, because you include the comma, foo5
is converted to matrix and then to a vector. Thus inside of the brackets foo5
becomes:
as.vector(as.matrix(foo5))
## TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
Now, the 4 TRUE
s occur in positions 1, 5, 6, and 12. So your subset command (foo[foo5,]
) is trying to grab rows 1, 5, 6, and 12 from foo
. Thus, the result is rows 1 and 5, and two rows of NA
s because foo
doesn't have rows 6 and 12.
Notice we can replicate your result like so:
foo[c(1,5,6,12), ]
## c1 c2 c3
## 1 11 9 1
## 5 53 2 2
## NA NA NA NA
## NA.1 NA NA NA