I am using the R programming language. I am interested in seeing whether the "c" statement can be used along with the "which" statement in R. For example, consider the following code (var1 and var2 are both "Factor" variables):
my_file
var1 var2
1 A AA
2 B CC
3 D CC
4 C AA
5 A BB
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
But this does not seem to be working.
I can run each of these conditions individually, e.g.
output <- my_file[which(my_file$var1 == "A" | my_file$var1 == "B" | my_file$var1 == "C"), ]
output1 <- output[which(output$var2 == "AA" | output$var2 == "CC" ), ]
But I would like to run them in a more "compact" form, e.g.:
ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]
Can someone please tell me what I am doing wrong?
Thanks
CodePudding user response:
When you compare my_file$var1 == c("A", "B", "C")
, the comparison will take place element-by-element, but because they are different lengths, the shorter will be repeated (with a warning because the repeating is incomplete.
c("A", "B", "D", "C", "A") == c("A", "B", "C", "A", "B")
giving:
c(TRUE, TRUE, FALSE, FALSE, FALSE)
, then which
will convert to c(1, 2)
.
The reason it works when you use one letter at a time is that the single element is repeated 5 times my_file$var1 == "A"
leads to c("A", "B", "D", "C", "A") == c("A", "A", "A", "A", "A")
and gives the result you expect.
@deschen is right, you should use %in%
output <- my_file[which(my_file$var1 %in% c("A", "B", "C") & !my_file$var2 %in% c("AA", "CC")), ]
CodePudding user response:
As @deschen says in a comment, you should use %in%
rather than ==
. You can also (1) get rid of the which()
(logical indexing works just as well here as indexing by position) and (2) use subset
to avoid re-typing my_file
.
output <- subset(my_file, var1 %in% c("A", "B", "C") &
!(var2 %in% c("AA", "CC")))
Alternatively, if you like the tidyverse, this would be:
library(dplyr)
output <- my_file %>% dplyr::filter(var1 %in% c("A", "B", "C"),
!(var2 %in% c("AA", "CC")))
(comma-separated conditions in filter()
work the same as &
).