I have the following data:
a <- structure(list(ID.x = c(1, 2, 3, 1, 6), ID.y = c(2, 4, 5, 3,
7), var.x = c(55, 82, 32, 94, 55), var.y = c(86, 24, 68, 63,
77)), class = "data.frame", row.names = c(NA, -5L))
> a
ID.x ID.y var.x var.y
1 1 2 55 86
2 2 4 82 24
3 3 5 32 68
4 1 3 94 63
5 6 7 55 77
I need to create a variable so that, given pairs of ID
, all the pairs that are connected by an element in common in the ID
variables have the same code.
For instance:
- Pair 1 (1 and 2) have ID_group = 1
- Pair 2 (2 and 4) has 2 in common with Pair 1, so it would have ID_pair = 1
- Pair 3 (3 and 5) has no element in common with Pair 1 and 2, but it has 3 in common with Pair 4 (which in turn has 1 in common with Pair 1), so it should still be 1
- Pair 4 (1 and 3) would be pair 1, as it has one element in common with Pair 1 and 2
- Pair 5 has no element in common with anything, so it should be 2 (or whatever)
The desired outcome would be
> a
ID.x ID.y var.x var.y ID_group
1 1 2 55 86 1
2 2 4 82 24 1
3 3 5 32 68 1
4 1 3 94 63 1
5 6 7 55 77 2
Some extra notes that might be helpful:
- ID_pair is categorical (and so are all the ID variables) and does not need to be sequential, just unique for each group. I am just using numbers as it has to be done with 22k entries.
- If it helps conceptualising it, the IDs represent related individuals. ID_group would be an identifier family they belong to.
- The data does not have to stay in this paired form, but it can also be in "long" form (I need to pair/unpair it often).
- A solution in R would be best, but I am more interested in the algorithm to obtain it than in the actual coding
Thank you in advance for your help!
CodePudding user response:
One way:
Using igraph:
library(igraph)
memb <- components(graph_from_data_frame(a))$membership
a$ID_group <- memb[as.character(a$ID.x)]
ID.x ID.y var.x var.y ID_group
1 1 2 55 86 1
2 2 4 82 24 1
3 3 5 32 68 1
4 1 3 94 63 1
5 6 7 55 77 2