Home > OS >  what is difference between [ data$group == c("A","B")] and [ data$group == "
what is difference between [ data$group == c("A","B")] and [ data$group == "

Time:11-11

I found sth different between these two functions but I don't know why they are different.

It'd be thankful if someone who know a reason about that leave some comments

Here's a problem

for exemple,

data$group has 16 chr of "A" and 16 chr of "B"

if I'm using a function c(),

d1 <- subset(data, data$group == c("A","B"))

d1 has just some part of total number like 12

but if I'm using another one,

d2 <- subset(data, data$group == "A" | data$group == "B")

d2 has same number of data

what makes those two things difference?

CodePudding user response:

I guess this is how your data looks like:

data <- data.frame(
  group = sample(LETTERS[1:2], 16, TRUE)
)
data
#>    group
#> 1      A
#> 2      B
#> 3      A
#> 4      A
#> 5      B
#> 6      B
#> 7      B
#> 8      A
#> 9      A
#> 10     A
#> 11     B
#> 12     A
#> 13     A
#> 14     A
#> 15     A
#> 16     A

subset(data, data$group == c("A","B"))
#>    group
#> 1      A
#> 2      B
#> 3      A
#> 6      B
#> 9      A
#> 13     A
#> 15     A

What happens: ==() replicates the vector c("A","B")until it matches the length of data$group.

This illustrates the result of subset(data, data$group == c("A","B")):

data.frame(
  group = sample(LETTERS[1:2], 16, TRUE),
  AB    = rep(c("A", "B"), 8),
  match = data$group == rep(c("A", "B"), 8)
)
#>    group AB match
#> 1      A  A  TRUE
#> 2      B  B  TRUE
#> 3      B  A  TRUE
#> 4      B  B FALSE
#> 5      B  A FALSE
#> 6      B  B  TRUE
#> 7      A  A FALSE
#> 8      B  B FALSE
#> 9      A  A  TRUE
#> 10     B  B FALSE
#> 11     A  A FALSE
#> 12     B  B FALSE
#> 13     A  A  TRUE
#> 14     A  B FALSE
#> 15     A  A  TRUE
#> 16     A  B FALSE

The match column shows which rows are TRUE and the row numbes match the indices of subset(data, data$group == c("A","B")). d2 <- subset(data, data$group == "A" | data$group == "B") the other hand compares each row of data$group with A or B. And that is true for all values in the vector.

Created on 2021-11-11 by the reprex package (v2.0.1)

  • Related