My data frame has two string columns I want to compare. The second (V2) is a list. My DF looks like this:
V1 V2 V3
oranges c("oranges", "apples", "berries", "plums", "cherries") 1
apples c("oranges", "apples", "berries", "bananas", "apples") 2
grapes c("oranges", "apples", "berries", "plums", "cherries") 0
berries c("berries", "apples", "berries", "plums", "cherries") 2
I want to check V1 row wise against V2 and total the frequency the string appears in V3. I have tried using the following code but end up with an empty dataframe.
matches <- x[!x$V1 %in% x$V2]
CodePudding user response:
V1 <- c("oranges", "apples", "grapes", "berries")
V2 <- list(c("oranges", "apples", "berries", "plums", "cherries"),
c("oranges", "apples", "berries", "bananas", "apples"), c("oranges",
"apples", "berries", "plums", "cherries"), c("berries", "apples",
"berries", "plums", "cherries"))
A straightforward solution is:
V3 <- mapply(function (x, y) sum(x == y), V1, V2)
#oranges apples grapes berries
# 1 2 0 2
Note that I could use ==
, because V1
has single value each row.
If V2
has identical number of elements each row, I recommend:
V3 <- rowSums(V1 == do.call(rbind, V2))
#[1] 1 2 0 2
CodePudding user response:
library(tidyverse)
df <- tibble::tribble(
~V1, ~V2,
"oranges", c("oranges", "apples", "berries", "plums", "cherries"),
"apples", c("oranges", "apples", "berries", "bananas", "apples"),
"grapes", c("oranges", "apples", "berries", "plums", "cherries"),
"berries", c("berries", "apples", "berries", "plums", "cherries"),
)
df %>%
rowwise() %>%
mutate(
V3 = sum(V1 == V2)
) %>%
ungroup()
#> # A tibble: 4 × 3
#> V1 V2 V3
#> <chr> <list> <int>
#> 1 oranges <chr [5]> 1
#> 2 apples <chr [5]> 2
#> 3 grapes <chr [5]> 0
#> 4 berries <chr [5]> 2
Created on 2022-07-25 by the reprex package (v2.0.1)