Home > front end >  Count of reversable pairs in R
Count of reversable pairs in R

Time:08-20

I have a dataset with pairs of items and want to count how often these pairs occur, regardless of their direction.

df <- data.frame(V1 = c("apple", "banana", "banana", "kiwi", "orange"), V2 = c("apple", "apple", "kiwi", "banana", NA)) 

This is what it looks like:


      V1     V2
1  apple  apple
2 banana  apple
3 banana   kiwi
4   kiwi banana
5 orange   <NA>

And what I want to have is this:


      V1     V2 count
1  apple  apple     1
2 banana  apple     1
3 banana   kiwi     2
5 orange   <NA>     1

It does not matter in which direction it goes. I tried using IDs (V1_V2 and V2_V1) and the %in% function, but I only manage to find the ones that occur in both columns and not count the pairs.

CodePudding user response:

df <- data.frame(
  V1 = c("apple", "banana", "banana", "kiwi", "orange"),
  V2 = c("apple", "apple", "kiwi", "banana", NA)
)

library(tidyverse)

df |>
  rowwise() |>
  mutate(g = paste(sort(c(V1, V2)), collapse = "")) |>
  group_by(g) |>
  summarise(
    V1 = min(V1),
    V2 = min(V2),
    n = n()
  ) |>
  select(-g)

CodePudding user response:

df <-
  data.frame(
    V1 = c("apple", "banana", "banana", "kiwi", "orange"),
    V2 = c("apple", "apple", "kiwi", "banana", NA)
  ) 

library(tidyverse)
df %>% 
  rowwise() %>% 
  mutate(tmp = paste(sort(c_across(everything())), collapse = "")) %>% 
  add_count(tmp) %>% 
  distinct(tmp, .keep_all = T) %>% 
  select(-tmp)
#> # A tibble: 4 x 3
#> # Rowwise: 
#>   V1     V2        n
#>   <chr>  <chr> <int>
#> 1 apple  apple     1
#> 2 banana apple     1
#> 3 banana kiwi      2
#> 4 orange <NA>      1

Created on 2022-08-19 with reprex v2.0.2

  • Related