Home > front end >  Identify matched pairs and create a common key
Identify matched pairs and create a common key

Time:03-15

I have

df<-data.frame(record_id=c("A", "B", "C", "D", "E", "F"), var1=1:6, matched.with=rev(c("A", "B", "C", "D", "E", "F")))

> df
  record_id var1 matched.with
1         A    1            F
2         B    2            E
3         C    3            D
4         D    4            C
5         E    5            B
6         F    6            A

so, id A has been matched with F, B with E, C with D

I would like to create a common.key variable for the pairs like

df.common.key
  record_id var1 matched.with common.key
1         A    1            F          1
2         B    2            E          2
3         C    3            D          3
4         D    4            C          3
5         E    5            B          2
6         F    6            A          1

Which shows that A has been matched with F (row 1) with key 1 and F has been matched with A (row 6) with also key 1. Common key does not have to be numeric, it can also be string or factor.

How can I accomplish this 1) if my data frame includes only matched pairs, 2) if my DF also includes observations without matched pairs 3) is there a tidyverse solution?

CodePudding user response:

You could do it with a temporary variable that pastes the strings from record_id and matched.with in alphabetical order (row-wise), then groups by that variable and selects the first var1 to put in the common.key

df %>% 
  rowwise() %>% 
  mutate(common = paste0(sort(c(record_id, matched.with)), collapse = '')) %>% 
  group_by(common) %>% 
  mutate(common.key = first(var1)) %>% 
  ungroup() %>% 
  select(-common)
#> # A tibble: 6 x 4
#> record_id  var1 matched.with common.key
#>   <chr>     <int> <chr>             <int>
#> 1 A             1 F                     1
#> 2 B             2 E                     2
#> 3 C             3 D                     3
#> 4 D             4 C                     3
#> 5 E             5 B                     2
#> 6 F             6 A                     1

CodePudding user response:

We can get the unique combination of letters (in the same order) using pmin and pmax. Then, we can ust the grp column to create the common.key.

library(tidyverse)

df %>% 
  group_by(grp = paste0(pmin(record_id, matched.with), pmax(record_id, matched.with))) %>% 
  mutate(common.key = cur_group_id()) %>% 
  select(-grp)

Output

  grp   record_id  var1 matched.with common.key
  <chr> <chr>     <int> <chr>             <int>
1 AF    A             1 F                     1
2 BE    B             2 E                     2
3 CD    C             3 D                     3
4 CD    D             4 C                     3
5 BE    E             5 B                     2
6 AF    F             6 A                     1

CodePudding user response:

We may also use

library(dplyr)
library(stringr)
df %>% 
   group_by(grp = str_c(pmin(record_id, matched.with), 
       pmax(record_id, matched.with))) %>% 
   mutate(common.key = first(var1)) %>%
   ungroup %>% 
   select(-grp)

-output

# A tibble: 6 × 4
  record_id  var1 matched.with common.key
  <chr>     <int> <chr>             <int>
1 A             1 F                     1
2 B             2 E                     2
3 C             3 D                     3
4 D             4 C                     3
5 E             5 B                     2
6 F             6 A                     1

CodePudding user response:

With a join:

library(tidyverse)

df %>%   
  mutate(id = pmin(matched.with, record_id)) %>% 
  left_join(select(df, record_id, var1), by = c("id" = "record_id")) %>% 
  select(record_id, var1 = var1.x, matched.with, common.key = var1.y)

# A tibble: 6 x 4
  record_id  var1 matched.with common.key
  <chr>     <int> <chr>             <int>
1 A             1 F                     1
2 B             2 E                     2
3 C             3 D                     3
4 D             4 C                     3
5 E             5 B                     2
6 F             6 A                     1
  • Related