Home > Back-end >  Can the "c" statement be used along with the "which" statement?
Can the "c" statement be used along with the "which" statement?

Time:11-14

I am using the R programming language. I am interested in seeing whether the "c" statement can be used along with the "which" statement in R. For example, consider the following code (var1 and var2 are both "Factor" variables):

 my_file

  var1 var2
1    A   AA
2    B   CC
3    D   CC
4    C   AA
5    A   BB

ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]

But this does not seem to be working.

I can run each of these conditions individually, e.g.

output <- my_file[which(my_file$var1 == "A" | my_file$var1 == "B" | my_file$var1 == "C"), ]
output1 <- output[which(output$var2 == "AA" | output$var2 == "CC" ), ]

But I would like to run them in a more "compact" form, e.g.:

ouput <- my_file[which(my_file$var1 == c("A", "B", "C") & my_file$var2 !== c("AA", "CC")), ]

Can someone please tell me what I am doing wrong?

Thanks

CodePudding user response:

When you compare my_file$var1 == c("A", "B", "C"), the comparison will take place element-by-element, but because they are different lengths, the shorter will be repeated (with a warning because the repeating is incomplete.
c("A", "B", "D", "C", "A") == c("A", "B", "C", "A", "B") giving: c(TRUE, TRUE, FALSE, FALSE, FALSE), then which will convert to c(1, 2).
The reason it works when you use one letter at a time is that the single element is repeated 5 times my_file$var1 == "A" leads to c("A", "B", "D", "C", "A") == c("A", "A", "A", "A", "A") and gives the result you expect.

@deschen is right, you should use %in%
output <- my_file[which(my_file$var1 %in% c("A", "B", "C") & !my_file$var2 %in% c("AA", "CC")), ]

CodePudding user response:

As @deschen says in a comment, you should use %in% rather than ==. You can also (1) get rid of the which() (logical indexing works just as well here as indexing by position) and (2) use subset to avoid re-typing my_file.

output <- subset(my_file, var1 %in% c("A", "B", "C") & 
                         !(var2 %in% c("AA", "CC")))

Alternatively, if you like the tidyverse, this would be:

library(dplyr)
output <- my_file %>% dplyr::filter(var1 %in% c("A", "B", "C"),
                           !(var2 %in% c("AA", "CC")))

(comma-separated conditions in filter() work the same as &).

  • Related