pattern <- c("apple", "banana")
dat <- data.frame(fruit1 = c("melon", "apple", "mango", "apple"),
                  fruit2 = c("banana", "melon", "papaya", "banana"))

> dat
  fruit1 fruit2
1  melon banana
2  apple  melon
3  mango papaya
4  apple banana

I want to find out if there's a match between pattern and the rows in dat. In the example above, there is a match in the 4th row of dat.

I tried using match, but that does not seem to work on data.frames. An alternative is to loop over each row of dat:

output <- vector()
for(i in 1:nrow(dat)){
  output[i] <- all(dat[i, ] %in% pattern)
}
> which(output)
[1] 4

This is inefficient if there are many rows in dat. Is there a faster way?

CodePudding user response：

You could filter the data like

dat |>
  subset(fruit1 == pattern[1] & fruit2 == pattern[2])

#   fruit1 fruit2
# 4  apple banana

If you just want the index:

which(colSums(t(dat) == pattern) == 2)
# [1] 4

or shorter

which(!colSums(t(dat) != pattern))
# [1] 4

CodePudding user response：

Approach 1: "manual" approach with indexing

library(dplyr)

dat %>%
  filter(fruit1 == pattern[1] & fruit2 == pattern[2])
#>   fruit1 fruit2
#> 1  apple banana

Approach 2: create a unique key across both data sources, then match with %in%.

This can be especially useful if you want to retain the "ID" that you matched on for future operations. You can remove it at the end with %>% select(-id) however.

ids <- paste0(pattern, collapse = "")

dat %>%
  mutate(id = paste0(fruit1, fruit2)) %>%
  filter(id %in% ids)
#>   fruit1 fruit2          id
#> 1  apple banana applebanana