Home > Software design >  How can I stack my dataset so each observation relates to all other observations but itself?
How can I stack my dataset so each observation relates to all other observations but itself?

Time:11-08

I would like to stack my dataset so all observations relate to all other observations but itself. Suppose I have the following dataset:

df <- data.frame(id = c("a", "b", "c", "d" ),
                 x1 = c(1,2,3,4))

df
  id x1
1  a  1
2  b  2
3  c  3
4  d  4

I would like observation a to be related to b, c, and d. And the same for every other observation. The result should look like something like this:

   id x1 id2 x2
1   a  1   b  2
2   a  1   c  3
3   a  1   d  4
4   b  2   a  1
5   b  2   c  3
6   b  2   d  4
7   c  3   a  1
8   c  3   b  2
9   c  3   d  4
10  d  4   a  1
11  d  4   b  2
12  d  4   c  3

So observation a is related to b,c,d. Observation b is related to a, c,d. And so on. Any ideas?

CodePudding user response:

Another option:

library(dplyr)
left_join(df, df, by = character()) %>%
  filter(id.x != id.y) 

Or

output <- merge(df, df, by = NULL)
output = output[output$id.x != output$id.y,]

Thanks @ritchie-sacramento, I didn't know the by = NULL option for merge before, and thanks @zephryl for the by = character() option for dplyr joins.

CodePudding user response:

tidyr::expand_grid() accepts data frames, which can then be filtered to remove rows that share the id:

library(tidyr)
library(dplyr)

expand_grid(df, df, .name_repair = make.unique) %>%
  filter(id != id.1)

# A tibble: 12 × 4
   id       x1 id.1   x1.1
   <chr> <dbl> <chr> <dbl>
 1 a         1 b         2
 2 a         1 c         3
 3 a         1 d         4
 4 b         2 a         1
 5 b         2 c         3
 6 b         2 d         4
 7 c         3 a         1
 8 c         3 b         2
 9 c         3 d         4
10 d         4 a         1
11 d         4 b         2
12 d         4 c         3

CodePudding user response:

You can use combn() to get all combinations of row indices, then assemble your dataframe from those:

rws <- cbind(combn(nrow(df), 2), combn(nrow(df), 2, rev))

df2 <- cbind(df[rws[1, ], ], df[rws[2, ], ])

# clean up row and column names
rownames(df2) <- 1:nrow(df2)
colnames(df2) <- c("id", "x1", "id2", "x2")

df2
   id x1 id2 x2
1   a  1   b  2
2   a  1   c  3
3   a  1   d  4
4   b  2   c  3
5   b  2   d  4
6   c  3   d  4
7   b  2   a  1
8   c  3   a  1
9   d  4   a  1
10  c  3   b  2
11  d  4   b  2
12  d  4   c  3
  • Related