Home > Software design >  For two dataframes df and logicaldf of the same size, what does the command df[logicaldf, ] execute
For two dataframes df and logicaldf of the same size, what does the command df[logicaldf, ] execute

Time:10-21

I recently came across df[logicaldf,] command and was really confused as I have only seen the df[x,] format where x is a column in df. Here is an example chunk when I ran the code in R:

> c1 <- c(11, 2, 3, 4, 53)
> c2 <- c(9, 3, 5, 5, 2)
> c3 <- c(1, 10, 3, 2, 2)
> foo <- data.frame(c1, c2, c3)
 
> foo5 <- foo > 5
> head(foo5)
        c1    c2    c3
[1,]  TRUE  TRUE FALSE
[2,] FALSE FALSE  TRUE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE FALSE
[5,]  TRUE FALSE FALSE

> table(rowSums(foo5))
0 1 2 
2 2 1 

> foo[foo5, ]
     c1 c2 c3
1    11  9  1
5    53  2  2
NA   NA NA NA
NA.1 NA NA NA

Could someone explain what is happening here?

CodePudding user response:

If you omit the comma, then you would just extract the elements where foo5==TRUE:

foo[foo5]
## 11 53  9 10

However, because you include the comma, foo5 is converted to matrix and then to a vector. Thus inside of the brackets foo5 becomes:

as.vector(as.matrix(foo5))
## TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

Now, the 4 TRUEs occur in positions 1, 5, 6, and 12. So your subset command (foo[foo5,]) is trying to grab rows 1, 5, 6, and 12 from foo. Thus, the result is rows 1 and 5, and two rows of NAs because foo doesn't have rows 6 and 12.

Notice we can replicate your result like so:

foo[c(1,5,6,12), ]
 ##      c1 c2 c3
 ## 1    11  9  1
 ## 5    53  2  2
 ## NA   NA NA NA
 ## NA.1 NA NA NA
  • Related