pattern <- c("apple", "banana")
dat <- data.frame(fruit1 = c("melon", "apple", "mango", "apple"),
fruit2 = c("banana", "melon", "papaya", "banana"))
> dat
fruit1 fruit2
1 melon banana
2 apple melon
3 mango papaya
4 apple banana
I want to find out if there's a match between pattern
and the rows in dat
. In the example above, there is a match in the 4th row of dat
.
I tried using match
, but that does not seem to work on data.frames. An alternative is to loop over each row of dat
:
output <- vector()
for(i in 1:nrow(dat)){
output[i] <- all(dat[i, ] %in% pattern)
}
> which(output)
[1] 4
This is inefficient if there are many rows in dat
. Is there a faster way?
CodePudding user response:
You could filter the data like
dat |>
subset(fruit1 == pattern[1] & fruit2 == pattern[2])
# fruit1 fruit2
# 4 apple banana
If you just want the index:
which(colSums(t(dat) == pattern) == 2)
# [1] 4
or shorter
which(!colSums(t(dat) != pattern))
# [1] 4
CodePudding user response:
Approach 1: "manual" approach with indexing
library(dplyr)
dat %>%
filter(fruit1 == pattern[1] & fruit2 == pattern[2])
#> fruit1 fruit2
#> 1 apple banana
Approach 2: create a unique key across both data sources, then match with %in%.
This can be especially useful if you want to retain the "ID" that you matched on for future operations. You can remove it at the end with %>% select(-id)
however.
ids <- paste0(pattern, collapse = "")
dat %>%
mutate(id = paste0(fruit1, fruit2)) %>%
filter(id %in% ids)
#> fruit1 fruit2 id
#> 1 apple banana applebanana