I have
df<-data.frame(record_id=c("A", "B", "C", "D", "E", "F"), var1=1:6, matched.with=rev(c("A", "B", "C", "D", "E", "F")))
> df
record_id var1 matched.with
1 A 1 F
2 B 2 E
3 C 3 D
4 D 4 C
5 E 5 B
6 F 6 A
so, id A
has been matched with F
, B
with E
, C
with D
I would like to create a common.key
variable for the pairs like
df.common.key
record_id var1 matched.with common.key
1 A 1 F 1
2 B 2 E 2
3 C 3 D 3
4 D 4 C 3
5 E 5 B 2
6 F 6 A 1
Which shows that A
has been matched with F
(row 1) with key 1
and F
has been matched with A
(row 6) with also key 1
. Common key does not have to be numeric, it can also be string or factor.
How can I accomplish this 1) if my data frame includes only matched pairs, 2) if my DF also includes observations without matched pairs 3) is there a tidyverse solution?
CodePudding user response:
You could do it with a temporary variable that pastes the strings from record_id
and matched.with
in alphabetical order (row-wise), then groups by that variable and selects the first var1
to put in the common.key
df %>%
rowwise() %>%
mutate(common = paste0(sort(c(record_id, matched.with)), collapse = '')) %>%
group_by(common) %>%
mutate(common.key = first(var1)) %>%
ungroup() %>%
select(-common)
#> # A tibble: 6 x 4
#> record_id var1 matched.with common.key
#> <chr> <int> <chr> <int>
#> 1 A 1 F 1
#> 2 B 2 E 2
#> 3 C 3 D 3
#> 4 D 4 C 3
#> 5 E 5 B 2
#> 6 F 6 A 1
CodePudding user response:
We can get the unique combination of letters (in the same order) using pmin
and pmax
. Then, we can ust the grp
column to create the common.key
.
library(tidyverse)
df %>%
group_by(grp = paste0(pmin(record_id, matched.with), pmax(record_id, matched.with))) %>%
mutate(common.key = cur_group_id()) %>%
select(-grp)
Output
grp record_id var1 matched.with common.key
<chr> <chr> <int> <chr> <int>
1 AF A 1 F 1
2 BE B 2 E 2
3 CD C 3 D 3
4 CD D 4 C 3
5 BE E 5 B 2
6 AF F 6 A 1
CodePudding user response:
We may also use
library(dplyr)
library(stringr)
df %>%
group_by(grp = str_c(pmin(record_id, matched.with),
pmax(record_id, matched.with))) %>%
mutate(common.key = first(var1)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 6 × 4
record_id var1 matched.with common.key
<chr> <int> <chr> <int>
1 A 1 F 1
2 B 2 E 2
3 C 3 D 3
4 D 4 C 3
5 E 5 B 2
6 F 6 A 1
CodePudding user response:
With a join:
library(tidyverse)
df %>%
mutate(id = pmin(matched.with, record_id)) %>%
left_join(select(df, record_id, var1), by = c("id" = "record_id")) %>%
select(record_id, var1 = var1.x, matched.with, common.key = var1.y)
# A tibble: 6 x 4
record_id var1 matched.with common.key
<chr> <int> <chr> <int>
1 A 1 F 1
2 B 2 E 2
3 C 3 D 3
4 D 4 C 3
5 E 5 B 2
6 F 6 A 1