Home > Enterprise >  How to use '%in%' operator in R?
How to use '%in%' operator in R?

Time:11-24

I have been using the %in% operator for a long time since I knew about it.

However, I still don't understand how it works. At least, I thought that I knew how, but I always doubt about the order of the elements.

Here you have an example:

This is my dataframe:

df <- data.frame("col1"=c(1,2,3,4,30,21,320,123,4351,1234,3,0,43), "col2"=rep("something",13))

This how it looks

> df
   col1      col2
1     1 something
2     2 something
3     3 something
4     4 something
5    30 something
6    21 something
7   320 something
8   123 something
9  4351 something
10 1234 something
11    3 something
12    0 something
13   43 something

Let's say I have a numerical vector:

myvector <- c(30,43,12,333334,14,4351,0,5,55,66)

And I want to check if all the numbers (or some) from my vector are in the previous dataframe. To do that, I always use %in%.

I thought 2 approaches:

#common in both: 30, 4351, 0, 43

# are the numbers from df$col1 in my vector?

    trial1 <- subset(df, df$col1 %in% myvector)

# are the numbers of the vector in df$col1?

    trial2 <- subset(df, myvector %in% df$col1)

Both approaches make sense to me and they should give the same result. However, only the result from trial1 is okay.

> trial1
   col1      col2
5    30 something
9  4351 something
12    0 something
13   43 something

What I don't understand is why the second way is giving me some common numbers and some which are not in the vector.

 col1      col2
1     1 something
2     2 something
6    21 something
7   320 something
11    3 something
12    0 something

Could someone explain to me how `%in% operator works and why the second way gives me the wrong result?

Thanks very much in advance

Regards

CodePudding user response:

Answer is given, but a bit more detailed simply look at the %in% result

df$col1 %in% myvector
# [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE

The above one is correct as you subset df and keep the TRUE values, row 5, 9, 12, 13

versus

myvector %in% df$col1
# [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE

This one goes wrong as you subset df and tell to keep 1, 2, 6, 7 and as length here is only 10 it recycles 11, 12, 13 as TRUE, TRUE, FALSE again so you get 11 and 12 in your subset as well

  • Related