Create groups of pair of IDs, with each group having IDs in common-CodePudding

I have the following data:

a <- structure(list(ID.x = c(1, 2, 3, 1, 6), ID.y = c(2, 4, 5, 3,
7), var.x = c(55, 82, 32, 94, 55), var.y = c(86, 24, 68, 63,
77)), class = "data.frame", row.names = c(NA, -5L))

> a
  ID.x ID.y var.x var.y
1    1    2    55    86
2    2    4    82    24
3    3    5    32    68
4    1    3    94    63
5    6    7    55    77

I need to create a variable so that, given pairs of ID, all the pairs that are connected by an element in common in the ID variables have the same code.

For instance:

Pair 1 (1 and 2) have ID_group = 1
Pair 2 (2 and 4) has 2 in common with Pair 1, so it would have ID_pair = 1
Pair 3 (3 and 5) has no element in common with Pair 1 and 2, but it has 3 in common with Pair 4 (which in turn has 1 in common with Pair 1), so it should still be 1
Pair 4 (1 and 3) would be pair 1, as it has one element in common with Pair 1 and 2
Pair 5 has no element in common with anything, so it should be 2 (or whatever)

The desired outcome would be

> a
  ID.x ID.y var.x var.y ID_group
1    1    2    55    86       1
2    2    4    82    24       1
3    3    5    32    68       1
4    1    3    94    63       1
5    6    7    55    77       2

Some extra notes that might be helpful:

ID_pair is categorical (and so are all the ID variables) and does not need to be sequential, just unique for each group. I am just using numbers as it has to be done with 22k entries.
If it helps conceptualising it, the IDs represent related individuals. ID_group would be an identifier family they belong to.
The data does not have to stay in this paired form, but it can also be in "long" form (I need to pair/unpair it often).
A solution in R would be best, but I am more interested in the algorithm to obtain it than in the actual coding

Thank you in advance for your help!

CodePudding user response：

One way:

Using igraph:

library(igraph)

memb <- components(graph_from_data_frame(a))$membership
a$ID_group <- memb[as.character(a$ID.x)]

  ID.x ID.y var.x var.y ID_group
1    1    2    55    86        1
2    2    4    82    24        1
3    3    5    32    68        1
4    1    3    94    63        1
5    6    7    55    77        2