So I have a dataset of parents and their children of the following form
Children_id Parent_id
10 1
11 1
12 1
13 2
14 2
What I want is a dataset of each child's siblings in long format, i.e.,
id sibling_id
10 11
10 12
11 10
11 12
12 10
12 11
13 14
14 13
What's the best way to achieve this, preferably using datatable?
Example data:
df <- data.frame("Children_id" = c(10,11,12,13,14), "Parent_id" = c(1,1,1,2,2))
CodePudding user response:
The graph experts out there will probably have better solutions, but here is a data.table
solution:
library(data.table)
setDT(df)[df,on=.(Parent_id), allow.cartesian=T] %>%
.[Children_id!=i.Children_id, .(id = i.Children_id, sibling=Children_id)]
Output:
id sibling
<num> <num>
1: 10 11
2: 10 12
3: 11 10
4: 11 12
5: 12 10
6: 12 11
7: 13 14
8: 14 13
CodePudding user response:
In base R
, we can use expand.grid
after split
ting
out <- do.call(rbind, lapply(split(df$Children_id, df$Parent_id), \(x)
subset(expand.grid(x, x), Var1 != Var2)[2:1]))
row.names(out) <- NULL
colnames(out) <- c("id", "sibling_id")
-output
> out
id sibling_id
1 10 11
2 10 12
3 11 10
4 11 12
5 12 10
6 12 11
7 13 14
8 14 13
Or using data.table
with CJ
library(data.table)
setDT(df)[, CJ(id = Children_id, sibling_id = Children_id),
Parent_id][id != sibling_id, .(id, sibling_id)]
id sibling_id
<num> <num>
1: 10 11
2: 10 12
3: 11 10
4: 11 12
5: 12 10
6: 12 11
7: 13 14
8: 14 13
CodePudding user response:
A dplyr
solution with inner_join
:
library(dplyr)
inner_join(df, df, by = "Parent_id") %>%
select(id = Children_id.x, siblings = Children_id.y) %>%
filter(id != siblings)
id siblings
1 10 11
2 10 12
3 11 10
4 11 12
5 12 10
6 12 11
7 13 14
8 14 13
or another strategy:
library(dplyr)
df %>%
group_by(Parent_id) %>%
mutate(siblings = list(unique(Children_id))) %>%
unnest(siblings) %>%
filter(Children_id != siblings)