I have a dataset with pairs of items and want to count how often these pairs occur, regardless of their direction.
df <- data.frame(V1 = c("apple", "banana", "banana", "kiwi", "orange"), V2 = c("apple", "apple", "kiwi", "banana", NA))
This is what it looks like:
V1 V2
1 apple apple
2 banana apple
3 banana kiwi
4 kiwi banana
5 orange <NA>
And what I want to have is this:
V1 V2 count
1 apple apple 1
2 banana apple 1
3 banana kiwi 2
5 orange <NA> 1
It does not matter in which direction it goes. I tried using IDs (V1_V2 and V2_V1) and the %in% function, but I only manage to find the ones that occur in both columns and not count the pairs.
CodePudding user response:
df <- data.frame(
V1 = c("apple", "banana", "banana", "kiwi", "orange"),
V2 = c("apple", "apple", "kiwi", "banana", NA)
)
library(tidyverse)
df |>
rowwise() |>
mutate(g = paste(sort(c(V1, V2)), collapse = "")) |>
group_by(g) |>
summarise(
V1 = min(V1),
V2 = min(V2),
n = n()
) |>
select(-g)
CodePudding user response:
df <-
data.frame(
V1 = c("apple", "banana", "banana", "kiwi", "orange"),
V2 = c("apple", "apple", "kiwi", "banana", NA)
)
library(tidyverse)
df %>%
rowwise() %>%
mutate(tmp = paste(sort(c_across(everything())), collapse = "")) %>%
add_count(tmp) %>%
distinct(tmp, .keep_all = T) %>%
select(-tmp)
#> # A tibble: 4 x 3
#> # Rowwise:
#> V1 V2 n
#> <chr> <chr> <int>
#> 1 apple apple 1
#> 2 banana apple 1
#> 3 banana kiwi 2
#> 4 orange <NA> 1
Created on 2022-08-19 with reprex v2.0.2