I found sth different between these two functions but I don't know why they are different.
It'd be thankful if someone who know a reason about that leave some comments
Here's a problem
for exemple,
data$group
has 16 chr
of "A
" and 16 chr
of "B
"
if I'm using a function c()
,
d1 <- subset(data, data$group == c("A","B"))
d1
has just some part of total number like 12
but if I'm using another one,
d2 <- subset(data, data$group == "A" | data$group == "B")
d2
has same number of data
what makes those two things difference?
CodePudding user response:
I guess this is how your data looks like:
data <- data.frame(
group = sample(LETTERS[1:2], 16, TRUE)
)
data
#> group
#> 1 A
#> 2 B
#> 3 A
#> 4 A
#> 5 B
#> 6 B
#> 7 B
#> 8 A
#> 9 A
#> 10 A
#> 11 B
#> 12 A
#> 13 A
#> 14 A
#> 15 A
#> 16 A
subset(data, data$group == c("A","B"))
#> group
#> 1 A
#> 2 B
#> 3 A
#> 6 B
#> 9 A
#> 13 A
#> 15 A
What happens: ==()
replicates the vector c("A","B")
until it
matches the length of data$group
.
This illustrates the result of
subset(data, data$group == c("A","B"))
:
data.frame(
group = sample(LETTERS[1:2], 16, TRUE),
AB = rep(c("A", "B"), 8),
match = data$group == rep(c("A", "B"), 8)
)
#> group AB match
#> 1 A A TRUE
#> 2 B B TRUE
#> 3 B A TRUE
#> 4 B B FALSE
#> 5 B A FALSE
#> 6 B B TRUE
#> 7 A A FALSE
#> 8 B B FALSE
#> 9 A A TRUE
#> 10 B B FALSE
#> 11 A A FALSE
#> 12 B B FALSE
#> 13 A A TRUE
#> 14 A B FALSE
#> 15 A A TRUE
#> 16 A B FALSE
The match
column shows which rows are TRUE and the row numbes match
the indices of subset(data, data$group == c("A","B"))
.
d2 <- subset(data, data$group == "A" | data$group == "B")
the other hand
compares each row of data$group
with A or B. And that is true for
all values in the vector.
Created on 2021-11-11 by the reprex package (v2.0.1)