Home > Back-end >  Combinations in long format by group
Combinations in long format by group

Time:05-08

So I have a dataset of parents and their children of the following form

Children_id   Parent_id
10            1
11            1
12            1
13            2
14            2

What I want is a dataset of each child's siblings in long format, i.e.,

id   sibling_id
10   11
10   12
11   10
11   12
12   10
12   11
13   14
14   13

What's the best way to achieve this, preferably using datatable?

Example data:

df <- data.frame("Children_id" = c(10,11,12,13,14), "Parent_id" = c(1,1,1,2,2))

CodePudding user response:

The graph experts out there will probably have better solutions, but here is a data.table solution:

library(data.table)

setDT(df)[df,on=.(Parent_id), allow.cartesian=T] %>% 
  .[Children_id!=i.Children_id, .(id = i.Children_id, sibling=Children_id)]

Output:

      id sibling
   <num>   <num>
1:    10      11
2:    10      12
3:    11      10
4:    11      12
5:    12      10
6:    12      11
7:    13      14
8:    14      13

CodePudding user response:

In base R, we can use expand.grid after splitting

out <- do.call(rbind, lapply(split(df$Children_id, df$Parent_id), \(x) 
     subset(expand.grid(x, x), Var1 != Var2)[2:1]))
row.names(out) <- NULL
colnames(out) <- c("id", "sibling_id")

-output

> out
  id sibling_id
1 10         11
2 10         12
3 11         10
4 11         12
5 12         10
6 12         11
7 13         14
8 14         13

Or using data.table with CJ

library(data.table)
setDT(df)[, CJ(id = Children_id, sibling_id = Children_id),
    Parent_id][id != sibling_id, .(id, sibling_id)]
      id sibling_id
   <num>      <num>
1:    10         11
2:    10         12
3:    11         10
4:    11         12
5:    12         10
6:    12         11
7:    13         14
8:    14         13

CodePudding user response:

A dplyr solution with inner_join:

library(dplyr)
inner_join(df, df, by = "Parent_id") %>% 
  select(id = Children_id.x, siblings = Children_id.y) %>% 
  filter(id != siblings)

  id siblings
1 10       11
2 10       12
3 11       10
4 11       12
5 12       10
6 12       11
7 13       14
8 14       13

or another strategy:

library(dplyr)
df %>% 
  group_by(Parent_id) %>% 
  mutate(siblings = list(unique(Children_id))) %>% 
  unnest(siblings) %>% 
  filter(Children_id != siblings)
  • Related